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INTRODUCTION 


In this book, we present the basic principles that underlie the analysis and 
design of digital communication systems. The subject of digital communica- 
tions involves the transmission of information in digital form from a source 
that generates the information to one or more destinations. Of particular 
importance in the analysis and design of communication systems are the 
characteristics of the physical channels through which the information is 
transmitted. The characteristics of the channel generally affect the design of 
the basic building blocks of the communication system. Below, we describe the 
elements of a communication system and their functions. 


1-1 ELEMENTS OF A DIGITAL COMMUNICATION 
SYSTEM 

Figure 1-1-1 illustrates the functional diagram and the basic elements of a 
digital communication system. The source output may be either an analog 
signal, such as audio or video signal, or a digital signal, such as the output of a 
teletype machine, that is discrete in time and has a finite number of output 
characters. In a digital communication system, the messages produced by the 
source are converted into a sequence of binary digits. Ideally, we should like to 
represent the source output (message) by as few binary digits as possible. In 
other words, we seek an efficient representation of the source output that 
results in little or no redundancy. The process of efficiently converting the 
output of either an analog or digital source into a sequence of binary digits is 
called source encoding or data compression. 

The sequence of binary digits from the source encoder, which we call the 
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FIGURE 1-1-1 Basic elements of a digital communication system. 


information sequence, is passed to the channel encoder. The purpose of the 
channel encoder is to introduce, in a controlled manner, some redundancy in 
the binary information sequence that can be used at the receiver to overcome 
the effects of noise and interference encountered in the transmission of the 
signal through the channel. Thus, the added redundancy serves to increase the 
reliability of the received data and improves the fidelity of the received signal. 
In effect, redundancy in the information sequence aids the receiver in decoding 
the desired information sequence. For example, a (trivial) form of encoding of 
the binary information sequence is simply to repeat each binary digit m times, 
where m is some positive integer. More sophisticated (nontrivial) encoding 
involves taking k information bits at a time and mapping each k - bit sequence 
into a unique n-bil sequence, called a code word. The amount of redundancy 
introduced by encoding the data in this manner is measured by the ratio n/k. 
The reciprocal of this ratio, namely kin, is called the rate of the code or, 
simply, the code rate. 

The binary sequence at the output of the channel encoder is passed to the 
digital modulator, which serves as the interface to the communications channel. 
Since nearly all of the communication channels encountered in practice are 
capable of transmitting electrical signals (waveforms), the primary purpose of 
the digital modulator is to map the binary information sequence into signal 
waveforms. To elaborate on this point, let us suppose that the coded 
information sequence is to be transmitted one bit at a time at some uniform 
rate R bits/s. The digital modulator may simply map the binary digit 0 into a 
waveform s 0 (t) and the binary digit 1 into a waveform s,(t). In this manner, 
each bit from the channel encoder is transmitted separately. We call this binary 
modulation. Alternatively, the modulator may transmit b coded information 
bits at a time by using M-2 b distinct waveforms j,(r), i * 0, 1, . . . , M - 1, one 
waveform for each of the 2 b possible 6 -bit sequences. We call this M-ary 
modulation (M > 2). Note that a new 6 -bit sequence enters the modulator 













CHAPTER I: INTRODUCTION 3 


every blR seconds. Hence, when the channel bit rate R is fixed, the amount of 
time available to transmit one of the Af waveforms corresponding to a b -bit 
sequence is b times the time period in a system that uses binary modulation. 

The communication channel is the physical medium that is used to send the 
signal from the transmitter to the receiver. In wireless transmission, the 
channel may be the atmosphere (free space). On the other hand, telephone 
channels usually employ a variety of physical media, including wire lines, 
optical fiber cables, and wireless (microwave radio). Whatever the physical 
medium used for transmission of the information, the essential feature is that 
the transmitted signal is corrupted in a random manner by a variety of possible 
mechanisms, such as additive thermal noise generated by electronic devices, 
man-made noise, e.g., automobile ignition noise, and atmospheric noise, e.g. t 
electrical lightning discharges during thunderstorms. 

At the receiving end of a digital communications system, the digital 
demodulator processes the channel-corrupted transmitted waveform and re- 
duces the waveforms to a sequence of numbers that represent estimates of the 
transmitted data symbols (binary or M-ary). This sequence of numbers is 
passed to the channel decoder, which attempts to reconstruct the original 
information sequence from knowledge of the code used by the channel 
encoder and the redundancy contained in the received data. 

A measure of how well the demodulator and decoder perform is the 
frequency with which errors occur in the decoded sequence. More precisely, 
the average probability of a bit-error at the output of the decoder is a measure 
of the performance of the demodulator-decoder combination. In general, the 
probability of error is a function of the code characteristics, the types of 
waveforms used to transmit the information over the channel, the transmitter 
power, the characteristics of the channel, i.e., the amount of noise, the nature 
of the interference, etc., and the method of demodulation and decoding. These 
items and their effect on performance will be discussed in detail in subsequent 
chapters. 

As a final step, when an analog output is desired, the source decoder accepts 
the output sequence from the channel decoder and, from knowledge of the 
source encoding method used, attempts to reconstruct the original signal from 
the source. Due to channel decoding errors and possible distortion introduced 
by the source encoder and, perhaps, the source decoder, the signal at the 
output of the source decoder is an approximation to the original source output. 
The difference or some function of the difference between the original signal 
and the reconstructed signal is a measure of the distortion introduced by the 
digital communication system. 

1-2 COMMUNICATION CHANNELS AND THEIR 
CHARACTERISTICS 

As indicated in the preceding discussion, the communication channel provides 
the connection between the transmitter and the receiver. The physical channel 
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may be a pair of wires that carry the electrical signal, or an optical liber that 
carries the information on a modulated light beam, or an underwater ocean 
channel in which the information is transmitted acoustically, or free space over 
which the information-bearing signal is radiated by use of an antenna. Other 
media that can be characterized as communication channels are data storage 
media, such as magnetic tape, magnetic disks, and optical disks. 

One common problem in signal transmission through any channel is additive 
noise. In general, additive noise is generated internally by components such as 
resistors and solid-state devices used to implement the communication system. 
This is sometimes called thermal noise. Other sources of noise and interference 
may arise externally to the system, such as interference from other users of the 
channel. When such noise and interference occupy the same frequency band as 
the desired signal, its effect can be minimized by proper design of the 
transmitted signal and its demodulator at the receiver. Other types of signal 
degradations that may be encountered in transmission over the channel are 
signal attenuation, amplitude and phase distortion, and multipath distortion. 

The effects of noise may be minimized by increasing the power in the 
transmitted signal. However, equipment and other practical constraints limit 
the power level in the transmitted signal. Another basic limitation is the 
available channel bandwidth. A bandwidth constraint is usually due to the 
physical limitations of the medium and the electronic components used to 
implement the transmitter and the receiver. These two limitations result in 
constraining the amount of data that can be transmitted reliably over any 
communications channel as we shall observe in later chapters. Below, we 
describe some of the important characteristics of several communication 
channels. 

Wireline Channels The telephone network makes extensive use of wire 
lines for voice signal transmission, as well as data and video transmission. 
Twisted-pair wire lines and coaxial cable are basically guided electromagnetic 
channels that provide relatively modest bandwidths. Telephone wire generally 
used to connect a customer to a central office has a bandwidth of several 
hundred kilohertz (kHz). On the other hand, coaxial cable has a usable 
bandwidth of several megahertz (MHz). Figure 1-2-1 illustrates the frequency 
range of guided electromagnetic channels, which include waveguides and 
optica] fibers. 

Signals transmitted through such channels are distored in both amplitude 
and phase and further corrupted by additive noise. Twisted-pair wireline 
channels are also prone to crosstalk interference from physically adjacent 
channels. Because wireline channels carry a large percentage of our daily 
communications around the country and the world, much research has been 
performed on the characterization of their transmission properties and on 
methods for mitigating the amplitude and phase distortion encountered in 
signal transmission. In Chapter 9, we describe methods for designing optimum 
transmitted signals and their demodulation; in Chapters 10 and 11, we 
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consider the design of channel equalizers that compensate for amplitude and 
phase distortion on these channels. 

Fiber Optic Channels Optical fibers offer the communications system 
designer a channel bandwidth that is several orders of magnitude larger than 
coaxial cable channels. During the past decade, optical fiber cables have been 
developed that have a relatively low signal attenuation, and highly reliable 
photonic devices have been developed for signal generation and signal 
detection. These technological advances have resulted in a rapid deployment of 
optical fiber channels, both in domestic telecommunication systems as well as 
for trans-Atlantic and trans-Pacific communications. With the large bandwidth 
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available on fiber optic channels, it is possible for telephone companies to offer 
subscribers a wide array of telecommunication services, including voice, data, 
facsimile, and video. 

The transmitter or modulator in a fiber optic communication system is a 
light source, either a light-emitting diode (LED) or a laser. Information is 
transmitted by varying (modulating) the intensity of the light source with the 
message signal. The light propagates through the fiber as a light wave and is 
amplified periodically (in the case of digital transmission, it is detected and 
regenerated by repealers) along the transmission path to compensate for signal 
attenuation. At the receiver, the light intensity is detected by a photodiode, 
whose output is an electrical signal that varies in direct proportion to the 
power of the light impinging on the photodiode. Sources of noise in fiber optic 
channels are photodiodes and electronic amplifiers. 

It is envisioned that optical fiber channels will replace nearly all wireline 
channels in the telephone network by the turn of the century. 

Wireless Electromagnetic Channels In wireless communication systems, 
electromagnetic energy is coupled to the propagation medium by an antenna 
which serves as the radiator. The physical size and the configuration of the 
antenna depend primarily on the frequency of operation. To obtain efficient 
radiation of electromagnetic energy, the antenna must be longer than ^ of the 
wavelength. Consequently, a radio station transmitting in the AM frequency 
band, say at f c = 1 MHz (corresponding to a wavelength of A = c/f. = 300 m), 
requires an antenna of at least 30 m. Other important characteristics and 
attributes of antennas for wireless transmission are described in Chapter 5. 

Figure 1-2-2 illustrates the various frequency bands of the electromagnetic 
spectrum. The mode of propagation of electromagnetic waves in the atmo- 
sphere and in free space may be subdivided into three categories, namely, 
ground-wave propagation, sky-wave propagation, and line-of-sight (LOS) 
propagation. In the VLF and audio frequency bands, where the wavelengths 
exceed 10 km, the earth and the ionosphere act as a waveguide for electromag- 
netic wave propagation. In these frequency ranges, communication signals 
practically propagate around the globe. For this reason, these frequency bands 
are primarily used to provide navigational aids from shore to ships around the 
world. The channel bandwidths available in these frequency bands are 
relatively small (usually 1-10% of the center frequency), and hence the 
information that is transmitted through these channels is of relatively slow 
speed and generally confined to digital transmission. A dominant type of noise 
at these frequencies is generated from thunderstorm activity around the globe, 
especially in tropical regions. Interference results from the many users of these 
frequency bands. 

Ground-waye propagation, as illustrated in Fig. 1-2-3, is the dominant mode 
of propagation for frequencies in the MF band (0.3-3 MHz). This is the 
frequency band used for AM broadcasting and maritime radio broadcasting. In 
AM broadcasting, the range with groundwave propagation of even the more 
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FIGURE 1-2-2 Frequency range for wireless electromagnetic channels. [Adapted from Carlson (1975), 

2nd edition, © McGraw-Hill Book Company Co. Reprinted with permission of the publisher.] 
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powerful radio stations is limited to about 150 km. Atmospheric noise, 
man-made noise, and thermal noise from electronic components at the receiver 
are dominant disturbances for signal transmission in the MF band. 

Sky-wave propagation, as illustrated in Fig. 1-2-4 results from transmitted 
signals being reflected (bent or refracted) from the ionosphere, which consists 
of several layers of charged particles ranging in altitude from 50 to 400 km 
above the surface of the earth. During the daytime hours, the heating of the 
lower atmosphere by the sun causes the formation of the lower layers at 
altitudes below 120 km. These lower layers, especially the D-layer, serve to 
absorb frequencies below 2 MHz, thus severely limiting sky-wave propagation 
of AM radio broadcast. However, during the night-time hours, the electron 
density in the lower layers of the ionosphere drops sharply and the frequency 
absorption that occurs during the daytime is significantly reduced. As a 
consequence, powerful AM radio broadcast stations can propagate over large 
distances via sky wave over the F-layer of the ionosphere, which ranges from 
140 to 400 km above the surface of the earth. 

A frequently occurring problem with electromagnetic wave propagation via 
sky wave in the HF frequency range is signal multipath. Signal multipath occurs 
when the transmitted signal arrives at the receiver via multiple propagation 
paths at different delays. It generally results in intersymbol interference in a 
digital communication system. Moreover, the signal components arriving via 
different propagation paths may add destructively, resulting in a phenomenon 
called signal fading, which most people have experienced when listening to a 
distant radio station at night when sky wave is the dominant propagation 
mode. Additive noise at HF is a combination of atmospheric noise and thermal 
noise. 

Sky-wave ionospheric propagation ceases to exist at frequencies above 
approximately 30 MHz, which is the end of the HF band. However, it is 
possible to have ionospheric scatter propagation at frequencies in the range 
30-60 MHz, resulting from signal scattering from the lower ionosphere. It is 
also possible to communicate over distances of several hundred miles by use of 
tropospheric scattering at frequencies in the range 40-300 MHz. Troposcatter 
results from signal scattering due to particles in the atmosphere at altitudes of 
10 miles or less. Generally, ionospheric scatter and tropospheric scatter 
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involve large signal propagation losses and require a large amount of 
transmitter power and relatively large antennas. 

Frequencies above 30 MHz propagate through the ionosphere with rela- 
tively little loss and make satellite and extraterrestrial communications 
possible. Hence, at frequencies in the VHF band and higher, the dominant 
mode of electromagnetic propagation is line-of-sight (LOS) propagation. For 
terrestrial communication systems, this means that the transmitter and receiver 
antennas must be in direct LOS with relatively little or no obstruction. For this 
reason, television stations transmitting in the VHF and UHF frequency bands 
mount their antennas on high towers to achieve a broad coverage area. 

In general, the coverage area for LOS propagation is limited by the 
curvature of the earth. If the transmitting antenna is mounted at a height h m 
above the surface of the earth, the distance to the radio horizon, assuming no 
physical obstructions such as mountains, is approximately d = Vl5 h km. For 
example, a TV antenna mounted on a tower of 300 m in height provides a 
coverage of approximately 67 km. As another example, microwave radio relay 
systems used extensively for telephone and video transmission at frequencies 
above 1 GHz have antennas mounted on tall towers or on the top of tall 
buildings. 

The dominant noise limiting the performance of a communication system in 
VHF and UHF frequency ranges is thermal noise generated in the receiver 
front end and cosmic noise picked up by the antenna. At frequencies in the 
SHF band above 10 GHz, atmospheric conditions play a major role in signal 
propagation. For example, at 10 GHz, the attenuation ranges from about 
0.003 dB/km in light rain to about 0.3 dB/km in heavy rain. At 100 GHz, the 
attenuation ranges from about 0.1 dB/km in light rain to about 6 dB/km in 
heavy rain. Hence, in this frequency range, heavy rain introduces extremely 
high propagation losses that can result in service outages (total breakdown in 
the communication system). 

At frequencies above the EHF (extremely high frequency) band, we have 
the infrared and visible light regions of the electromagnetic spectrum, which 
can be used to provide LOS optical communication in free space. To date, 
these frequency bands have been used in experimental communication 
systems, such as satellite-to-satellite links. 

Underwater Acoustic Channels Over the past few decades, ocean ex- 
ploration activity has been steadily increasing. Coupled with this increase is the 
need to transmit data, collected by sensors placed under water, to the surface 
of the ocean. From there, it is possible to relay the data via a satellite to a data 
collection center. 

Electromagnetic waves do not propagate over long distances under water 
except at extremely low frequencies. However, the transmission of signals at 
such low frequencies is prohibitively expensive because of the large and 
powerful transmitters required. The attenuation of electromagnetic waves in 
water can be expressed in terms of the skin depth, which is the distance a signal 
is attenuated by 1/e. For sea water, the skin depth 5 = 250/V/, where / is 
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expressed in H 2 and is in m. For example, at 10 kHz, the skin depth is 2.5 m. 
In contrast, acoustic signals propagate over distances of tens and even 
hundreds of kilometers. 

An underwater acoustic channel is characterized as a multipath channel due 
to signal reflections from the surface and the bottom of the sea. Because of 
wave motion, the signal multipath components undergo time-varying propaga- 
tion delays that result in signal fading. In addition, there is frequency- 
dependent attenuation, which is approximately proportional to the square of 
the signal frequency. The sound velocity is nominally about 1500 m/s, but the 
actual value will vary either above or below the nominal value depending on 
the depth at which the signal propagates. 

Ambient ocean acoustic noise is caused by shrimp, fish, and various 
mammals. Near harbors, there is also man-made acoustic noise in addition to 
the ambient noise. In spite of this hostile environment, it is possible to design 
and implement efficient and highly reliable underwater acoustic communica- 
tion systems for transmitting digital signals over large distances. 


Storage Channels Information storage and retrieval systems constitute a 
very significant part of data-handling activities on a daily basis. Magnetic tape, 
including digital audio tape and video tape, magnetic disks used for storing 
large amounts of computer data, optical disks used for computer data storage, 
and compact disks are examples of data storage systems that can be 
characterized as communication channels. The process of storing data on a 
magnetic tape or a magnetic or optical disk is equivalent to transmitting a 
signal over a telephone or a radio channel. The readback process and the 
signal processing involved in storage systems to recover the stored information 
are equivalent to the functions performed by a receiver in a telephone or radio 
communication system to recover the transmitted information. 

Additive noise generated by the electronic components and interference 
from adjacent tracks is generally present in the readback signal of a storage 
system, just as is the case in a telephone or a radio communication system. 

The amount of data that can be stored is generally limited by the size of the 
disk or tape and the density (number of bits stored per square inch) that can be 
achieved by (he write/read electronic systems and heads. For example, a 
packing density of 10 s * bits per square inch has been recently demonstrated in 
an experimental magnetic disk storage system. (Current commercial magnetic 
storage products achieve a much lower density.) The speed at which data can 
be written on a disk or tape and the speed at which it can be read back are also 
limited by the associated mechanical and electrical subsystems that constitute 
an information storage system. 

Channel coding and modulation are essential components of a well-designed 
digital magnetic or optical storage system. In the readback process, the signal is 
demodulated and the added redundancy introduced by the channel encoder is 
used to correct errors in the readback signal. 
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1-3 MATHEMATICAL MODELS FOR 
COMMUNICATION CHANNELS 

In the design of communication systems for transmitting information through 
physical channels, we find it convenient to construct mathematical models that 
reflect the most important characteristics of the transmission medium. Then, 
the mathematical model for the channel is used in the design of the channel 
encoder and modulator at the transmitter and the demodulator and channel 
decoder at the receiver. Below, we provide a brief description of the 
channel models that are frequently used to characterize many of the physical 
channels that we encounter in practice. 

The Additive Noise Channel The simplest mathematical model for a 
communication channel is the additive noise channel, illustrated in Fig. 1-3-1. 
In this model, the transmitted signal s(f) is corrupted by an additive random 
noise process n(t). Physically, the additive noise process may arise from 
electronic components and amplifiers at the receiver of the communication 
system, or from interference encountered in transmission (as in the case of 
radio signal transmission). 

If the noise is introduced primarily by electronic components and amplifiers 
at the receiver, it may be characterized as thermal noise. This type of noise is 
characterized statistically as a gaussian noise process. Hence, the resulting 
mathematical model for the channel is usually called the additive gaussian 
noise channel. Because this channel model applies to a broad class of physical 
communication channels and because of its mathematical tractability, this is 
the predominant channel model used in our communication system analysis 
and design. Channel attenuation is easily incorporated into the model. When 
the signal undergoes attenuation in transmission through the channel, the 
received signal is 

r(t) = as(t) + rt{t) (1-3-1) 

where a is the attenuation factor. 

The Linear Filter Channel In some physical channels, such as wireline 
telephone channels, filters are used to ensure that the transmitted signals do 
not exceed specified bandwidth limitations and thus do not interfere with one 
another. Such channels are generally characterized mathematically as linear 
filter channels with additive noise, as illustrated in Fig. 1-3-2. Hence, if the 
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FIGURE i-J-1 The additive noise channel. 
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FIGURE 1-5-2 The linear filter channel with j Channel 1 

additive noise. • ... > 


channel input is the signal s(t), the channel output is the signal 

r(f) = s(/)*c(t) + n(f) 

= [ c(x)s(t - x) dx + n(t) (1-3-2) 

where c(f) is the impulse response of the linear filter and ★ denotes 
convolution. 

The Linear Time-Variant Filter Channel Physical channels such as under- 
water acoustic channels and ionospheric radio channels that result in time- 
variant multipath propagation of the transmitted signal may be characterized 
mathematically as time-variant linear filters. Such linear filters are charac- 
terized by a time-variant channel impulse response c(r;r), where c(t;r) is the 
response of the channel at time t due to an impulse applied at time t - x. Thus, 
r represents the “age” (elapsed-time) variable. The linear time-variant filter 
channel with additive noise is illustrated in Fig. 1-3-3. For an input signal s(r), 
the channel output signal is 

r(r) = i(r) ★ c(r; r) + n(r) 

= f c(r;f)j(f- x)dx + n(t) (1-3-3) 

•r - X 

A good model for multipath signal propagation through physical channels, 
such as the ionosphere (at frequencies below 30 MHz) and mobile cellular 
radio channels, is a special case of (1-3-3) in which the time-variant impulse 
response has the form 

I. 

c(r;/)= 2 a*(>)3(r - I*) (1-3-4) 

k - I 


sit) 


FIGURE 1-5-3 Linear lime- variant filter channel with additive noise. 
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where the {<i*(r)} represents the possibly time-variant attenuation factor for the 
L multipath propagation paths and {t*} are the corresponding time delays. If 
(1-3-4) is substituted into (1-3-3), the received signal has the form 

'( 0 = (1-3-5) 

*=i 

Hence, the received signal consists of L multipath components, where each 
component is attenuated by {o*(r)} and delayed by {t*}. 

The three mathematical models described above adequately characterize the 
great majority of the physical channels encountered in practice. These three 
channel models are used in this text for the analysis and design of communica- 
tion systems. 


1-4 A HISTORICAL PERSPECTIVE IN THE 
DEVELOPMENT OF DIGITAL COMMUNICATIONS 

It is remarkable that the earliest form of electrical communication, namely 
telegraphy, was a digital communication system. The electric telegraph was 
developed by Samuel Morse and was demonstrated in 1837. Morse devised the 
variable-length binary code in which letters of the English alphabet are 
represented by a sequence of dots and dashes (code words). In this code, more 
frequently occurring letters are represented by short code words, while letters 
occurring less frequently are represented by longer code words. Thus, the 
Morse code ' was the precursor of the variable-length source coding methods 
described in Chapter 3. 

Nearly 40 years later, in 1875, Emile Baudot devised a code for telegraphy 
in which every letter was encoded into fixed-length binary code words of length 
5. In the Baudot code, binary code elements are of equal length and designated 
as mark and space. 

Although Morse is responsible for the development of the first electrical 
digital communication system (telegraphy), the beginnings of what we now 
regard as modem digital communications stem from the work of Nyquist 
(1924), who investigated the problem of determining the maximum signaling 
rate that can be used over a telegraph channel of a given bandwidth without 
intersymbol interference. He formulated a model of a telegraph system in 
which a transmitted signal has the general form 

s(0 = 'Za„g(t-nT) (1-4-1) 

n 

where*g(r) represents a basic pulse shape and {a„} is the binary data sequence 
of {±1} transmitted at a rate of 1/7 bits/s. Nyquist set out to determine the 
optimum pulse shape that was bandlimited to W Hz and maximized the bit rate 
under the constraint that the pulse caused no intersymbol interference at the 
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sampling time k/T. k =0, ±1, ±2, His studies led him to conclude that the 

maximum pulse rale is 2 W pulses/s. This rate is now called the Nyquist rale. 
Moreover, this pulse rate can be achieved by using the pulses g(t) = 
(sin 2nWt)j2x\Vt. This pulse shape allows recovery of the data without 
intersymbol interference at the sampling instants. Nyquist's result is equivalent 
to a version of the sampling theoiem for bandlimited signals, which was later 
stated precisely by Shannon (1948). The sampling theorem states that a signal 
of bandwidth W can be reconstructed from samples taken at the Nyquist rate 
of 2 W samples/s using the interpolation formula 



sin [2xW(t-n/2W)) 
2nW(t - nf2W) 


(1-4-2) 


In light of Nyquist's work. Hartley (1928) considered the issue of the 
amount of data that can be transmitted reliably over a bandlimited channel 
when multiple amplitude levels are used. Due to the presence of noise and 
other interference. Hartley postulated that the receiver can reliably estimate 
the received signal amplitude to some accuracy, say A s . This investigation led 
Hartley to conclude that there is a maximum data rate that can be 
communicated reliably over a bandlimited channel when the maximum signal 
amplitude is limited to A max (fixed power constraint) and the amplitude 
resolution is A*. 

Another significant advance in the developmen* of communications was the 
work of Wiener (1942), who considered the problem of estimating a desired 
signal waveform s(r) in the presence of additive noise n(f), based on 
observation of the received signal r(t) = s(t) + n(t). This problem arises in 
signal demodulation. Wiener determined the linear filter whose output is the 
best mean-square approximation to the desired signal $(/). The resulting filter 
is called the optimum linear ( Wiener ) filter. 

Hartley’s and Nyquist’s results on the maximum transmission rate of digital 
information were precursors to the work of Shannon ( 1948a, b), who establ- 
ished the mathematical foundations for information transmission and derived 
the fundamental limits for digital communication systems. In his pioneering 
work. Shannon formulated the basic problem of reliable transmission of 
information in statistical terms, using probabilistic models for information 
sources and communication channels. Based on such a statistical formulation, 
he adopted a logarithmic measure for the information content of a source. He 
also demonstrated that the effect of a transmitter power constraint, a 
bandwidth constraint, and additive noise can be associated with the channel 
and incorporated into a single parameter, called the channel capacity. For 
example, in the case of an additive white (spectrally flat) gaussian noise 
inierference, an ideal bandlimited channel of bandwidth W has a capacity C 
given by 


( 1 - 4 - 3 ) 
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where P is the average transmitted power and N {) is the power spectral density 
of the additive noise. The significance of the channel capacity is as follows: If 
the information rate R from the source is less than C (R <C) then it is 
theoretically possible to achieve reliable (error-free) transmission through the 
channel by appropriate coding. On the other hand, if R>C, reliable 
transmission is not possible regardless of the amount of signal processing 
performed at the transmitter and receiver. Thus, Shannon established basic 
limits on communication of information, and gave birth to a new field that is 
now called information theory. 

Another important contribution to the field of digital communication is the 
worli of Kotelnikov (1947), who provided a coherent analysis of the various 
digital communication systems based on a geometrical approach. Kotelnikov s 
approach was later expanded by Wozencraft and Jacobs (1965). 

Following Shannon’s publications, came the classic work of Hamming 
(1950) on error-detecting and error-correcting codes to combat the detrimental 
effects of channel noise. Hamming’s work stimulated many researchers in the 
years that followed, and a variety of new and powerful codes were discovered, 
many of which are used today in the implementation of modern communica- 
tion systems. 

The increase in demand for data transmission during the last three to four 
decades, coupled with the development of more sophisticated integrated 
circuits, has led to the development of very efficient and more reliable digital 
communication systems. In the course of these developments. Shannon’s 
original results and the generalization of his results on maximum transmission 
limits over a channel and on bounds on the performance achieved have served 
as benchmarks for any given communication system design. The theoretical 
limits derived by Shannon and other researchers that contributed to the 
development of information theory serve as an ultimate goal in the continuing 
efforts to design and develop more efficient digital communication systems. 

There have been many new advances in the area of digital communications 
following the early work of Shannon, Kotelnikov, and Hamming. Some of the 
most notable developments are the following: 

• The development of new block codes by Muller (1954), Reed (1954), 
Reed and Solomon (1960), Bose and Ray-Chaudhuri (1960a, b), and Goppa 
(1970, 1971). 

• The development of concatenated codes by Forney (1966). 

• The development of computationally efficient decoding of BCH codes, 
e.g., the Berlekamp- Massey algorithm (see Chien, 1964; Berlekamp, 1968). 

• The development of convolutional codes and decoding algorithms by 
Wozencraft and Reiffen (1961), Fano (1963), Zigangirov (1966), Jelinek 
(1969), Forney (1970, 1972), and Viterbi (1967, 1971). 

• The development of trellis-coded modulation by Ungerboeck (1982), 
Forney el al. (1984), Wei (1987), and others. 

• The development of efficient source encodings algorithms for data 
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compression, such as those devised by Ziv and Lempel (1977, 1978) and Linde 
el al. (1980). 

1-5 OVERVIEW OF THE BOOK 

Chapter 2 presents a brief review of the basic notions in the theory of 
probability and random processes. Our primary objectives in this chapter are 
to present results that are used throughout the book and to establish some 
necessary notation. 

In Chapter 3, we provide an introduction to source coding for discrete and 
analog sources. Included in this chapter are the Huffman coding algorithm and 
the Lempel- Ziv algorithm for discrete sources, and scalar and vector quantiza- 
tion techniques for analog sources. 

Chapter 4 treats the characterization of communication signals and systems 
from a mathematical viewpoint. Included in this chapter is a geometric 
representation of signal waveforms used for digital communications. 

Chapters 5-8 are focused on modulation/demodulation and channel 
coding/ decoding for the additive, white gaussian noise channel. The emphasis 
is on optimum demodulation and decoding techniques and their performance. 

The design of efficient modulators and demodulators for linear filter 
channels with distortion is treated in Chapters 9-11. The focus is on signal 
design and on channel equalization methods to compensate for the channel 
distortion. 

The final four chapters treat several more specialized topics. Chapter 12 
treats multichannel and multicarrier communication systems. Chapter 13 is 
focused on spread spectrum signals for digital communications and their 
performance characteristics. Chapter 14 provides a in-depth treatment of 
communication through fading multipath channels. Included in this treatment 
is a description of channel characterization, signal design and demodulation 
techniques and their performance, and coding/decoding techniques and their 
performance. The last chapter of the book is focused on multiuser communica- 
tion systems and multiple access methods. 

1-6 BIBLIOGRAPHICAL NOTES AND REFERENCES 

There are several historical treatments regarding the development of radio and 
telecommunications during the past century. These may be found in the books 
by McMahon (1984), Millman (1984), and Ryder and Fink (1984). We have 
already cited the classical works of Nyquist (1924), Hartley (1928), Kotelnikov 
(1947), Shannon (1948), and Hamming (1950), as well as some of the more 
important advances that have occurred in the field since 1950. The collected 
papers by Shannon have been published by IEEE Press in a book edited by 
Sloane and Wyner (1993). Other collected works published by the IEEE Press 
that might be of interest to the reader are Key Papers in the Development of 
Coding Theory, edited by Berlekamp (1974), and Key Papers in the 
Development of Information Theory, edited by Slepian (1974). 
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PROBABILITY AND 
STOCHASTIC 
PROCESSES 


The theory of probability and stochastic processes is an essential mathematical 
tool in the design of digital communication systems. This subject is important 
in the statistical modeling of sources that generate the information, in the 
digitization of the source output, in the characterization of the channel through 
which the digital information is transmitted, in the design of the receiver that 
processes the information-bearing signal from the channel, and in the 
evaluation of the performance of the communication system. Our coverage of 
this rich and interesting subject is brief and limited in scope. We present a 
number of definitions and basic concepts in the theory of probability and 
stochastic processes and we derive several results that are important in the 
design of efficient digital communication systems and in the evaluation of their 
performance. 

We anticipate that most readers have had some prior exposure to the theory 
of probability and stochastic processes, so that our treatment serves primarily 
as a review. Some readers, however, who have had no previous exposure may 
find the presentation in this chapter extremely brief. These readers will benefit 
from additional reading of engineering-level treatments of the subject found in 
the texts by Davenport and Root (1958), Davenport (1970), Papoulis (1984), 
Helstrom (1991), and Leon-Garcia (1994). 


2-1 PROBABILITY 

Let us consider an experiment, such as the rolling of a die, with a number of 
possible outcomes. The sample space S of the experiment consists of the set of 
all possible outcomes. In the case of the die. 


S — {1. 2, 3, 4, 5,6} 


( 2 - 1 - 1 ) 

17 





18 DIGITAL COMMUNICATIONS 


where the integers 1, ... ,6 represent the number of dots on the six faces of the 
die. These six possible outcomes are the sample points of the experiment. An 
event is a subset of 5, and may consist of any number of sample points. For 
example, the event A defined as 


A = {2, 4} (2-1-2) 

consists of the outcomes 2 and 4. The complement of the event A, denoted by 
A, consists of all the sample points in S that are not in A and, hence, 

A -{1,3, 5, 6} (2-1-3) 

Two events are said to be mutally exclusive if they have no sample points in 
common — that is, if the occurrence of one event excludes the occurrence of the 
other, . or example, if A is defined as in (2-1-2) and the event B is defined as 

B = {1, 3, 6} (2-1-4) 

then A and B are mutually exclusive events. Similarly, A and A are mutually 
exclusive events. 

The union (sum) of two events is an event that consists of all the sample 
points in the two events. For example, if B is the event defined in (2-1-4) and C 
is the event defined as 


C = {1, 2, 3} (2-1-5) 

then, the union of B and C, denoted by B DC, is the event 

D = B DC 


= {1,2,3, 6} (2-1-6) 

Similarly, A U A = S, where S is the entire sample space or the certain event. 
On the other hand, the intersection of two events is an event that consists of 
the points that are common to the two events. Thus, if E = fl n C represents 
the intersection of the events B and C, defined by (2-1-4) and (2-1-5), 
respectively, then 


£ = {1,3} 

When the events are mutually exclusive, the intersection is the null event, 
denoted as 0. For example. AD B =0, and A(1A=0. The definitions of 
union and intersection are extended to more than two events in a straightfor- 
ward manner. 

Associated with each event A contained in S is its probability P(A). In the 
assignment of probabilities to events, we adopt an axiomatic viewpoint. That 
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is, we postulate that the probability of the event A satisfies the condition 
P(A) 5* 0. We also postulate that the probability of the sample space (certain 
event) is P(S) = 1. The third postulate deals with the probability of mutually 
exclusive events. Suppose that A h i = 1,2,..., are a (possibly infinite) number 
of events in the sample space S such that 

A, D Aj = 0 i Aj = 1,2,... 

Then the probability of the union of these mutually exclusive events satisfies 
the condition 

= 2 fW (2-1-7) 

For example, in a roll of a fair die, each possible outcome is assigned the 
probability £. The event A defined by (2-1-2) consists of two mutually exclusive 
subevents or outcomes, and, hence, P(A) = l = \. Also, the probability of the 
event A U B, where A and B are the mutually exclusive events defined by 
(2-1-2) and (2-1-4), respectively, is P(A) + P{B) =3 + 2 = 1. 


Joint Events and Joint Probabilities Instead of dealing with a single 
experiment, let us perform two experiments and consider their outcomes. For 
example, the two experiments may be two separate tosses of a single die or a 
single toss of two dice. In either case, the sample space S consists of the 36 
two-tuples (i,j) where i,j~ 1,2,..., 6. If the dice are fair, each point in the 
sample space is assigned the probability We may now consider joint events, 
such as {/ is even, j = 3}, and determine the associated probabilities of such 
events from knowledge of the probabilities of the sample points. 

In general, if one experiment has the possible outcomes A,, i = 1, 2, . . . , n, 

and the second experiment has the possible outcomes B jt j = 1, 2 m, then 

the combined experiment has the possible joint outcomes (A,, B ), / = 

1> 2 n, j = 1, 2, . . . , m. Associated with each joint outcome {A,, B ; ) is the 

joint probability P(A it B t ) which satisfies the condition 

0 <P{A h B,)*i\ 

Assuming that the outcomes B n j - 1,2, .... m, are mutually exclusive, it 
follows that 

m 

2 P(A h Bj) = P(Ai) (2-1-8) 

7=1 

Similarly, if the outcomes A h i = 1, 2, . . . , n, are mutually exclusive then 

2 P(A„B / ) = P(B / ) 

1 = 1 


(2-1-9) 
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Furthermore, if all the outcomes of the two experiments are mutually exclusive 
then 

= i (2 -mo) 

'=1 i=t 

The generalization of the above treatment to more than two experiments is 
straightforward. 


Conditional Probabilities Consider a combined experiment in which a 
joint event occurs with probability P(A, B). Suppose that the event B has 
occurred and we wish to determine the probability of occurrence of the event 
A. This is called the conditional probability of the event A given the occurrence 
of the event B and is defined as 

| = (2-1-11) 

provided P{B) > 0. In a similar manner, the probability of the event B 
conditioned on the occurrence of the event A is defined as 

P(BjA) = ^P(A? (2 ' M2) 

provided T(A)>0. The relations in (2-1-11) and (2-1-12) may also be 
expressed as 

P(A,B) = P(A | B)P(B) - P(B | A)P(A ) (2-1-13) 

The relations in (2-1-11), (2-1-12), and (2-1-13) also apply to a single 
experiment in which A and B are any two events defined on the sample space S 
and P(A, B ) is interpreted as the probability of the A Ct B. That is, P(A, B) 
denotes the simultaneous occurrence of A and B. For example, consider the 
events B and C given by (2-J-4) and (2-1-5), respectively, for the single toss of 
a die. The joint event consists of the sample points {1,3}. The conditional 
probability of the event C given that B occurred is 

F(C|fl) = |H 

6 

In a single experiment, we observe that when two events A and B are 
mutually exclusive, A n B - 0 and, hence, P(A | B) = 0. Also, if A is a subset 
of B then A D B — A and, hence, 
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On the other hand, if B is a subset of A, we have A fl B = B and, hence* 


P(A | B ) = 


P(B) 

P(B) 


An extremely useful relationship for conditional probabilities is Bayes’ 
theorem, which states that if A„ i = 1,2, , n, are mutually exclusive events 
such that 

LM,=5 

i = i 


and B is an arbitrary event with nonzero probability then 


P(A,\B) = 


P(A„ B) 
P(B) 


P(B Aj)P(A,) 

i P(B I A,)P(A,) 


(2-1-14) 


We use this formula in Chapter 5 to derive the structure of the optimum 
receiver for a digital communication system in which the events A h i = 
1,2, ...,n, represent the possible transmitted messages in a given time 
interval, / > (A ; ) represent their a priori probabilities, B represents the received 
signal, which consists of the transmitted message (one of the A ( ) corrupted by 
noise, and P(A j j B) is the a posteriori probability of A, conditioned on having 
observed the received signal B. 


Statistical Independence The statistical independence of two or more 
events is another important concept in probability theory. It usually arises 
when we consider two or more experiments or repeated trials of a single 
experiment. To explain this concept, we consider two events A and B and their 
conditional probability P(A \ B), which is the probability of occurrence of A 
given that B has occurred. Suppose that the occurrence of A does not depend 
on the occurrence of B. That is, 

P(A\B) = P(A) (2-1-15) 

Substitution of (2-1-15) into (2-1-13) yields the result 

P(A,B) = P(A)P(B) (2-1-16) 

That is, the joint probability of the events A and B factors into the product of 
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the elementary or marginal probabilities P(A) and P{B). When the events A 
and B satisfy the relation in (2-1-16), they are said to be statistically 
independent. 

For example, consider two successive experiments in tossing a die. Let A 
represent the even-numbered sample points {2,4,6) in the first toss and B 
represent the even-numbered possible outcomes {2,4,6} in the second toss. In 
a fair die, we assign the probabilities P{A) = { and P(B)= 2 . Now, the joint 
probability of the joint event “even-numbered outcome on the first toss and 
even-numbered outcome on the second toss” is just the probability of the nine 
pairs of outcomes (/,/), i — 2, 4, 6 ' j = 2, 4, 6, which is Also, 

P(A, B) — P{A)P(B) = { 

Thus, the events A and B are statistically independent. Similarly, we may say 
that the outcomes of the two experiments are statistically independent. 

The definition of statistical independence can be extended to three or more 
events. Three statistically independent events A,, A>, and A, must satisfy the 
following conditions: 


P(A t . A 2 ) = P(A ] )P(A 2 ) 


P(A { ,A,) = P(A<)P(A>) 


P(A 2 , A,) = P(A Z )P(A>) 


(2-1-17) 


P(A ) ,A 2 , A 1 ) = P(A,)P(A 2 )P(A,) 

In the general case, the events A„ i - 1,2 n, are statistically independent 

provided that the probabilities of the joint events taken 2, 3, 4 and n at a 

time factor into the product of the probabilities of the individual events. 


2-1-1 Random Variables, Probability Distributions, and 
Probability Densities 

Given an experiment having a sample space S and elements s e S, we define a 
function X ( 5 ) whose domain is 5 and whose range is a set of numbers on the 
real line. The function X(s) is called a random variable. For example, if we flip 
a coin the possible outcomes are head (H) and tail (T), so S eontains two 
points labeled H and T. Suppose we define a function A"(s) such that 



(s = H) 
O' = T) 


(2-1-18) 


Thus we have mapped the two possible outcomes of the coin-flipping 
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FIGURE 2-1-1 


experiment into the two points (±1) on the real line. Another experiment is 
the toss of a die with possible outcomes S = {1, 2, 3, 4, 5, 6}. A random variable 
defined on this sample space may be A"(s) = s, in which case the outcomes of 
the experiment are mapped into the integers 1,. . .,6, or, perhaps, X(s) = s 2 , 
in which case the possible outcomes are mapped into the integers 
{1,4, 9, 16, 25,36}. These are examples of discrete random variables. 

Although we have used as examples experiments that have a finite set of 
possible outcomes, there are many physical systems (experiments) that 
generate continuous outputs (outcomes). For example, the noise voltage 
generated by an electronic amplifier has a continuous amplitude. Conse- 
quently, the sample space S of voltage amplitudes v e S is continuous and so is 
the mapping X{v) = u. In such a case, the random variablef A' is said to be a 
continuous random variable. 

Given a random variable X, let us consider the event {A" *£ where x is any 
real number in the interval (-ac, »). We write the probability of this event as 
P( X x ) and denote it simply by F(x), i.e., 

F(x) - P{X « x) (~oo< x <x) (2-1-19) 

The function F(x) is called the probability distribution function of the random 
variable X. It is also called the cumulative distribution function (cdf). Since 
F(t) is a probability, its range is limited to the interval 0«F(jr)«l. In fact, 
F(~ x ) = 0 and F(x) — 1. For example, the discrete random variable generated 
by flipping a fair coin and defined by (2-1-18) has the cdf shown in Fig. 
2-l-l(a). There are two discontinuities or jumps in F(;t), one at x- -1 and 
one at x= 1. Similarly, the random variable A'(s , ) = s generated by tossing a 
fair die has the cdf shown in Fig. 2-1-1 (b). In this case F(x) has six jumps, one 
at each of the points x - 1, .... 6. 


Examples of the cumulative distribution functions of two discrete random variables. 


F{. x) 


I 

ia) 


F HI 


1 

5 



6 

4 


6 

3 


6 

7 

6 

1 




1 J 

6 



0 


2 3 4 J 6 


(b) 


t The random variable A'(j) will be written simply as X 
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FIGURE 2-1-2 


FIGURE 2-1-3 


An example of the cumulative distribution function of a 
continuous random variable. 


fix) 



The cdf of a continuous random variable typically appears as shown in Fig, 
2-1-2. This is a smooth, nondecreasing function of x. In some practical 
problems, we may also encounter a random variable of a mixed type. The cdf 
of such a random variable is a smooth, nondecreasing function in certain parts 
of the real line and contains jumps at a number of discrete values of x. An 
example of such a cdf is illustrated in Fig. 2-1-3. 

The derivative of the cdf F(x), denoted as p(jf), is called the probability 
density function (pdf) of the random variable X. Thus, we have 

dFlx) 

p{x) = — ~ ^ (-oo<jC<ac) (2-1-20) 

or, equivalently 

F(x) = f p(u)du (-°°<jr<ac) (2-1-21) 

J — "X. 

Since F(ar) is a nondecreasing function, it follows that p(x)^0. When the 
random variable is discrete or of a mixed type, the pdf contains impulses at the 
points of discontinuity of F(x). In such cases, the discrete part of p(x) may be 
expressed as 

n 

p(x) = 2 F{X = xf) S(x - x.) (2-1-22) 

I = I 

where x h i-l,2,...,n, are the possible discrete values of the random 


An example of the cumulative distribution 
function of a random variable of a mixed type. 


AO 
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variable; P(X = x t ), i = 1, 2, . . . , n, are the probabilities, and 6(x) denotes an 
impulse at x ~ 0. 

Often we are faced with the problem of determining the probability that a 
random variable X falls in an interval (jci, x 2 ), where *2 >*!• To determine the 
probability of this event, let us begin with the event {X « x 2 }. The event can 
always be expressed as the union of two mutually exclusive events {X « xj and 
{x, <X ^r 2 }' Hence the probability of the event {A' ^x 2 } can be expressed as 
the sum of the probabilities of the mutually exclusive events. Thus we have 

P(X «x 2 ) - P(X x,) + P(x, < X ^x 2 ) 

F(x 2 ) = F(x x ) + P( Xi <X^x 2 ) 

or, equivalently, 


P(x t <X^x 2 ) = F(x 2 )-F( Xl ) 
= f p{x)dx 


(2-1-23) 


In other words, the probability of the event {xi <A" *£x 2 } is simply the area 
under the pdf in the range x,<^^x 2 . 


Multiple Random Variables, Joint Probability Distributions, and Joint 
Probability Densities In dealing with combined experiments or repeated 
trials of a single experiment, we encounter multiple random variables and their 
cdfs and pdfs. Multiple random variables are basically multidimensional 
functions defined on a sample space of a combined experiment. Let us begin 
with two random variables X x and X 2 , each of which may be continuous, 
discrete, or mixed. The joint cumulative distribution function (joint cdf) for the 
two random variables is defined as 


^(Xi.xa) = P(X x =Sx,, X 2 ^x 2 ) 

O p(u u u 2 )du, du 2 

- CO 


(2-1-24) 


where p(x,, x 2 ) is the joint probability density function (joint pdf). The latter 
may also be expressed in the form 

d 2 

P(x,,x 2 )= ■ F(x i,x 2 ) (2-1-25) 

dJTj 0X2 

When the joint pdf p(xj,x 2 ) is integrated over one of the variables, we 
obtain the pdf of the other variable. That is, 

j P(xi,x 2 )dx l =p(x 2 ) 


(2-1-26) 
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The pdfs p(x ,) and p(x 2 ) obtained from integrating over one of the variables 
are called marginal pdfs. Furthermore, if p(x u x 2 ) is integrated over both 
variables, we obtain 


f f P(x u x 2 )dxi dx 2 = F(<x, =°) = 1 (2-1-27) 

J — 3C J — X 


We also note that F( — <», — ») = F(— jr 2 ) = = 0. 

The generalization of the above expressions to multidimensional random 
variables is straightforward. Suppose that X„ { = 1,2 ,.. . , n, are random 
variables with a joint cdf defined as 


F(x u x 2 , ■ . . , x n ) = P(X y ^x u X 2 ^x 2 , . . . ,X„ 

2 r* u 

••• p(u lt u 2 , . . . , u n )du ] du 2 - ■ ■ du„ 

X J — -x 


(2-1-28) 


where p(x u x 2 , ■ ■ ■ , x„) is the joint pdf. By taking the partial derivatives of 
F(x u x 2 , ■ . . ,x„) given by (2-1-28), we obtain 


p(Xj , X 2t ■ . . , X,,) 


d" 


dx ] dx 2 • ' ■ dx n 


F( Xx,x 2 x„) 


(2-1-29) 


Any number of variables in p(x l , x 2 , . . . , x„) can be eliminated by integrating 
over these variables. For example, integration over x 2 and x 2 yields 



p{ x l ,x 2 ,x ? , . 


• • » X n ) dx 2 dx 3 p (Xi , X 4 , . . ■ , X n ) 


(2-1-30) 


It also follows that /-"(jr, , 00 , oo ; x 4 , 


, X n ) = F(x J, x 4 , x 5 , . . . , x„) and 


F(x u -<*>,x 4 , ...,x„) = 0. 


Conditional Probability Distribution Functions Let us consider two ran- 
dom variables X, and X 2 with joint pdf p(x,,x 2 ). Suppose that we wish to 
determine the probability that the random variable X 2 conditioned on 


X 2 AX 2 ^ X 2 ^ x 2 


where Ax 2 is some positive increment. That is, we wish to determine the 
probability of the event (A, | x 2 - Ax 2 < X 2 «x 2 ). Using the relations 

established earlier for the conditional probability of an event, the probability 
of the event (A", « x, Jx 2 - Ax 2 <X 2 =£x 2 ) can be expressed as the probability 
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of the joint event (A^ *£* 1 , x 2 - Ax 2 < X 2 ^x 2 ) divided by the probability of 
the event (x 2 - Ax 2 <X 2 ^ x 2 ). Thus 


P(X^ Xl \x 2 -Ax 2 <X 2 ^x 2 )~ 


u 2 )dUi du 2 
frl-^ } p(u 2 )du 2 


F( Xi ,x 2 )-F(x u x 2 - Ax 2 ) 
F(x 2 )-F{x 2 -Ax 2 ) 


(2-1*31 ) 


Assuming that the pdfs p(x 1 , x 2 ) and p(x 2 ) are continuous functions over the 
interval (x 2 - Ax 2 , x 2 ), we may divide both numerator and denominator in 
(2-1-31) by Ax 2 and take the limit as Ax 2 — ►(). Thus we obtain 


P(A' 1 =£x 1 I A” 2 = x 2 ) = F(x x |x 2 ) = 


dF(x i,x 2 )/dx 2 
dF(Xi)ldx 2 


d[f-~ f-~p(u 1 , u 2 ) du l du 2 ]/dx 2 
d[F-~p{u 2 )du 2 ]/d x 2 


f- l =cP(ui, x 2 )dui 
P(* 2 ) 


(2-1-32) 


which is the conditional cdf of the random variable X x given the random 
variable X 2 . We observe that F(— «|x 2 ) = 0 and F(°°|x 2 ) = l. By 
differentiating (2-1-32) with respect to x,, we obtain the corresponding pdf 
p(x 1 |x 2 ) in the form 

p(x, Ix 2 ) = ? (Xi ’ X2) (2-1-33) 

P(* 2 ) 

Alternatively, we may express the joint pdf p(x u x 2 ) in terms of the 
conditional pdfs, p(x x | x 2 ) or p(x 2 1 x x ), as 

P(jf|.*2)=P(jCl |*2)P(* 2 ) 

= p(x 2 \x l )p(x ] ) (2-1-34) 

The extension of the relations given above to multidimensional random 
variables is also easily accomplished. Beginning with the joint pdf of the 
random variables X h i = 1, 2, . . . , n, we may write 

P{x 1 > x 2 , , x„) p(x,,x 2 , . . . , Xh j x^-h j, . • • , x„)p (x* +1 , . . . , x „ ) (2-1-35) 

where k is any integer in the range 1 < k < n. The joint conditional cdf 

corresponding to the pdf p(x„ x 2 , . . . , x* | **«., x„) is 

F(x it x 2> ... , x* |x* +1 , . . . ,x„) 

_ /-'« • • • f*-<*p(u U U 2 ,..., U k , X k+} , . ■ ■ , x„) dUj du 2 - ■ ■ du k 
p(x k+u .. . ,x„) 


(2-1-36) 
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This conditional cdf satisfies the properties previously established for these 
functions, such as 

F(°°,x 2 , ... ,x k |** + l , . . . ,x„) = F(x 2 ,Xi, ... ,x k \x k + l , x n ) 

F(~™,x 2 ,-..,x k \ x k+I , x„) = 0 

Statistically Independent Random Variables. We have already defined 
statistical independence of two or more events of a sample space 5. The 
concept of statistical independence can be extended to random variables 
defined on a sample space generated by a combined experiment or by repeated 
trials of a single experiment. If the experiments result in mutually exclusive 
outcomes, the probability of an outcome in one experiment is independent of 
an outcome in any other experiment. That is, the joint probability of the 
outcomes factors into a product of the probabilities corresponding to each 
outcome. Consequently, the random variables corresponding to the outcomes 
in these experiments are independent in the sense that their joint pdf factors 
into a product of marginal pdfs. Hence the multidimensional random variables 
are statistically independent if and only if 

F(x u x 2 x„) = F(x,)F(x 2 ) ■ • • F(x„) (2-1-37) 

or, alternatively, 

p(x u x 2 x„)=p(x l )p(x 2 ) - p(x n ) (2-1-38) 

2-1-2 Functions of Random Variables 

A problem that arises frequently in practical applications of probability is the 
following. Given a random variable X, which is characterized by its pdf />(*). 
determine the pdf of the random variable Y = g( X), where g(.Y) is some given 
function of X. When the mapping g from X to Y is one-to-one, the 
determination of p(y) is relatively straightforward. However, when the 
mapping is not one-to-one, as is the case, for example, when Y = X 2 , we must 
be very careful in our derivation of p(y). 

Example 2-1-1 

Consider the random variable Y defined as 

Y = aX+b (2-1-39) 

where a and b are constants. We assume that a > 0. If a < 0, the approach is 

similar (see Problem 2-3). We note that this mapping, illustrated in Fig. 

2-l-4(a) is linear and monotonic. Let F x (x ) and F r (y) denote the cdfs for X 

and y, respectively.) Then 

F y {y ) = P( y *£ y ) = P(aX + b y ) = p(x =£ ) 

(y - b\ 

= J p x (x)dx = F x {~— J (2-1-40) 

t To avoid confusion in changing variables, subscripts are used in the respective pdfs and cdfs. 
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FIGURE 2-1-4 


i 


P*U) 







A linear transformation of a random variable X and an example of the corresponding pdfs of X 
and Y. 


By differentiating (2-1-40) with respect to y, we obtain the relationship 
between the respective pdfs. It is 

Pr{y) = -Px(- — ") (2-1-41) 

Thus (2-1-40) and (2-1-41) specify the cdf and pdf of the random variable Y 
in terms of the cdf and pdf of the random variable X for the linear 
transformation in (2-1-39). To illustrate this mapping for a specific pdf 
Px{x), consider the one shown in Fig. 2-1-4 (ft). The pdf p Y (y) that results 
from the mapping in (2-1-39) is shown in Fig. 2-l-4(c). 


Example 2-1-2 

Consider the random variable Y defined as 


Y - aX* + b, a> 0 (2-1-42) 

As in Example 2-1-1, the mapping between X and Y is one-to-one. Hence 
F Y (y) = P(Y *y) = P(aX 3 + b*Zy) 


(2-1-43) 
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FIGURE 2-1-5 


A quadratic transformation of the random variable X. 


Y 



Differentiation of (2-1-43) with respect to y yields the desired relationship 
between the two pdfs as 

pM ~ia[( } -hV.r P * 


(“) 


Example 2-1-3 

The random variable Y is defined as 

Y = aX 2 + b, a> 0 (2-1-45) 

In contrast to Examples 2-1-1 and 2-1-2, the mapping between X and K, 
illustrated in Fig. 2-1-5, is not one-lo-one. To determine the cdf of Y, we 
observe that 


Hence 


F Y (y) = P(Y^y) = P(aX 2 + b*Zy) 



(2-1-46) 


Differentiating (2-1-46) with respect to y, we obtain the pdf of Y in terms of 
the pdf of X in the form 


/ ■> . Px[^{y-b)fa] . Pjr[-V(y-6)/g] 
^ 2aV[( > -ft)/ a ] 2aV[(y-b)/a] 


(2-1-47) 


In Example 2-1-3, we observe that the equation g(x) = ax 2 + b = y has two 
real solutions, 


*i = 


l y~b 

\ a 
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and that p Y (y) consists of two terms corresponding to these two solutions. 
That is. 


Px[x i - rfy - b)fa] | Px[x 2 = ~^(y ~ b)/a] 
|g'[jr, = V(y - b)la]\ = ~ V(_y - b)la\\ 


(2-1-48) 


where g'(;t) denotes the first derivative of g(jt). 

In the general case, suppose that jr,, x 2 , . . . , x., are the real roots of the 
equation g(x)=y. Then the pdf of the random variable Y~g(X) may be 
expressed as 


Pr(y) * Z 777 77 

.■-i If (-OI 


(2-1-49) 


where the roots x„ i ~ 1,2,..., n, are functions of y. 

Now let us consider functions of multidimensional random variables. 
Suppose that X,, i-\,2,...,n, are random variables with joint pdf 
Pxi x i> * 2 . • • • and let Y h i = 1,2 , .... n, be another set of n random 
variables related to the X t by the functions 


Y,=g,(X u X 2 X n ), 1 = 1,2 « (2-1-50) 

We assume that the g,(X U X 2 ,...,X„), i = 1, 2, . . . , n, are single-valued 
functions with continuous partial derivatives and invertible. By “invertible” we 
mean that the X,, / = 1,2, can be expressed as functions of Y : , 
i = 1, 2, . . . , n, in the form 


Xj =gi\Y u Y 2t . . . , Y„), i = 1,2 n (2-1-51) 

where the inverse functions are also assumed to be single-valued with 
continuous partial derivatives. The problem is to determine the joint pdf of Y , , 
i=l,2, denoted by Pv'(>'i, y 2 , . . . , y„), given the joint pdf 
Px(x U X 2 , x„). 

To determine the desired relation, let R x be the region in the n-dimensionai 
space of the random variables X h i = 1, 2, . . . , n, and let R y be the 

(one-to-one) mapping of R x defined by the functions Y, = &( X,,X 2 X„). 

Clearly, 



Pv(Ti. yi, ■ • • , y„) dy } dy 2 ■■■ dy n 


-//-/ 


Px(x uX 2 , . . . ,x„)dx } dx 2 - • ■ dx n 


(2-1-52) 


By making a change in variables in the multiple integral on the right-hand side 
of (2-1-52) with the substitution 

x, = gi~'(y,,y 2 y„) = g,\ 


i — 1 , 2, . . . , n 
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we obtain 




= g„ ' ) \J\ dy x dy 2 • ■ ■ dy„ 

(2-1-53) 


where J denotes the jacobian of the transformation, defined by the determinant 


*gl 

9gi ‘ 

»g* 

fty'i 

d.Vi 

rt.Vi 


agi 

1 r 1 

*g;, 1 

f>y» 

dy„ 

*y n 


(2-1-54) 


Consequently, the desired relation for the joint pdf of the Y,, i = 1,2 n, is 

P>(vi. y 2 , y.,) = Px(xi =8 1 '.x 2 = gz ' = g„‘)|/| (2-1-55) 


Example 2-1-4 

An important functional relation between two sets of n -dimensional random 
variables that frequently arises in practice is the linear transformation 

n 

Y, = 2 a„X r i - 1,2 n (2-1-56) 

y'-i 

where the {d, y } are constants. It is convenient to employ the matrix form for 
the transformation, which is 

Y = AX (2-1-57) 

where X and Y are n -dimensional vectors and A is an ai x« matrix. We 
assume that A is nonsingular. Then A is invertible and, hence, 

X = A^'Y (2-1-58) 

Equivalently, we have 

n 

X, = 2 b t/ Y r 1 — 1.2 a (2-1-59) 

/ = ! 

where { b , t } are the elements of the inverse matrix A '. The jacobian of this 
transformation is J - 1/det A. Hence 

PY(yi,y 2 ,- ■ • .y«) 

( n n ” \ 1 

Xl ~ 2 b Xl y,, x 2 — ^ b v yj x„ — 2 *«,y> ) . . . . 

(2-1-60) 
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2-1-3 Statistical Averages of Random Variables 

Averages play an important role in the characterization of the outcomes of 
experiments and the random variables defined on the sample space of the 
experiments. Of particular interest are the first and second moments of a single 
random variable and the joint moments, such as the correlation and covari- 
ance. between any pair of random variables in a multidimensional set of 
random variables. Also of great importance are the characteristic function for a 
single random variable and the joint characteristic function for a multidimen- 
sional set of random variables. This section is devoted to the definition of these 
important statistical averages. 

First we consider a single random variable X characterized by its pdf p(x). 
The mean or expected value of X is defined as 


£(X) = m x = xp(x)dx (2-1-61) 

J — x 

where E{ ) denotes expectation (statistical averaging). This is the first moment 
of the random variable X. In general, the nth moment is defined as 

£■(*")=] x n p(x)dx (2-1-62) 


Now, suppose that we define a random variable Y = g( X), where g(A') is 
some arbitrary function of the random variable X. The expected value of Y is 

£(T) = £[g(A')] = j g(x)p(x)dx (2-1-63) 

In particular, if Y — {X - m x ) n where m x is the mean value of X, then 

E{Y)= E\{X -m x ) n } = f (x - m x )"p(x) dx (2-1-64) 

J — x 


This expected value is called the nth central moment of the random variable X, 
because it is a moment taken relative to the mean. When n = 2, the central 
moment is called the variance of the random variable and denoted as <r;. 
That is, 




(x - m x ) 2 p(x)dx 


(2-1-65) 


This parameter provides a measure of the dispersion of the random variable X. 
By expanding the term (x - m x ) 2 in the integral of (2-1-65) and noting that the 
expected value of a constant is equal to the constant, we obtain the expression 
that relates the variance to the first and second moments, namely, 

al = E(X 2 )-[E(X)Y 
~ E(X 2 ) - ml 


( 2 - 1 - 66 ) 
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Jn the case of two random variables, X t and X 2 , with joint pdf p(x tl x 2 ), we 
define the joint moment as 

' E{X\X 5)=f f xUip{xuX 2 )dx,dx i (2-1-67) 

and the joint central moment as 
E[{X l ~m i ) k (X 2 ~m 2 y] 

- f f (jc, ~m 1 )*(x 2 -m 2 )”p{x u x 2 )dx i dx 2 (2-1-68) 

J — 3C J -3C 

where m t = E(X t ). Of particular importance to us are the joint moment and 
joint central moment corresponding to k = n = 1. These joint moments are 
called the correlation and the covariance of the random variables X t and X 2 , 
respectively. 

In considering multidimensional random variables, we can define joint 
moments of any order. However, the moments that are most useful in practical 
applications are the correlations and covariances between pairs of random 

variables. To elaborate, suppose that X h 1=1,2 n, are random variables 

with joint pdf p(x u x 2 ,x n ). Let p(x„ x,) be the joint pdf of the random 

variables X, and Xj. Then the correlation between X, and X, is given by the 
joint moment 

E{XiX,) = f f XiXjpjXi, Xj) dXj dxj (2-1-69) 

J-x J ^ x 



The n Xn matrix with elements is called the covariance matrix of the 
random variables X it i = 1, 2 , . . . , n. We shall encounter the covariance matrix 
in our discussion of jointly gaussian random variables in Section 2-1-4. 

Two random variables are said to be uncorrelated if E(X,X,) = 
E(X,)E(X t ) = m,mj. In that case, the covariance p. v = 0. We note that When X, 
and Xj are statistically independent, they are also uncorrelated. However, if X ) 
and Xj are uncorrelated, they are not necessarily statistically independent. 
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Two random variables are said to be orthogonal if E(X,X j ) = 0. We note 
that this condition holds when Xi and X, are uncorrelated and either one or 
both of the random variables have zero mean. 


Characteristic Functions The characteristic function of a random variable 
X is defined as the statistical average 

E{e> vX ) - ip(jv) = ( e ivx p(x) dx (2-1-7 1 ) 

where the variable v is real and j = V^T. We note that ifiUv) may be described 
as the Fourier transform!" of the pdf p(x). Hence the inverse Fourier trans- 
form is 

p{ w = h I Hjv)e ,vx dv (2_1 _72) 


One useful property of the characteristic function is its relation to the 
moments of the random variable. We note that the first derivative of (2-1-71) 
with respect to v yields 

dip(jv) f x 

^U.j)^ p( x)dx 

By evaluating the derivative at v = 0, we obtain the first moment (mean) 

= (2-, -73, 

dv k= 0 

The differentiation process can be repeated, so that the nth derivative of tfi(jv) 
evaluated at v — 0 yields the nth moment 


E(X n ) = (- j T -ELL (2-1-74) 

UV v=0 

Thus the moments of a random variable can be determined from the 
characteristic function. On the other hand, suppose that the characteristic 
function can be expanded in a Taylor series about the point v ~ 0. That is. 


'i'(P) = 


f W) 1 

~ 0 l dv n J„= 0 n! 


(2-1-75) 


Using the relation in (2-1-74) to eliminate the derivative in (2-1-75), we obtain 


t Usually the Fourier transform of a function g(u) is defined as G(v) = g(u )< du, which 
differs from (2-1-71) by the negative sign in the exponential. This is a trivial difference, however, so 
we call the integral in (2-1-71) a Fourier transform. 
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an expression for the characteristic function in terms of its moments in the 
form 

£(*")- ~ (2-1-76) 

n =0 «•' 

The characteristic function provides a simple method for determining the 
pdf of a sum of statistically independent random variables. To illustrate this 

point, let X„ i = 1,2 n, be a set of n statistically independent random 

variables and let 

n 

Y=^X, (2-1-77) 

i- \ 

The problem is to determine the pdf of Y. We shall determine the pdf of Y by 
first finding its characteristic function and then computing the inverse Fourier 
transform. Thus 

M/v) = E(ef* v ) 

= E exp (jv 2 A',) 

= J ■ ■ • J (fl e' vx jp(Xf x„) dx { dx 2 ■ ■ ■ dx„ (2-1-78) 

Since the random variables are statistically independent, p(x, , x 2 , . . . , x„) = 
p(X\)p(x 2 ) • • 'p(.t n ), and, hence, the nth-order integral in (2-1-78) reduces to a 
product of n single integrals, each corresponding to the characteristic function 
of one of the X,. Hence, 

n 

<M» = n <M;u) (2-1-79) 

i — 1 

If, in addition to their statistical independence,- the X t are identically 
distributed then all the ^ A (/u) are identical. Consequently, 

<MA') = [*l'xUv)] n (2-1-80) 

Finally, the pdf of Y is determined from the inverse Fourier transform of 
'M/v), given by (2-1-72). 

Since the characteristic function of the sum of n statistically independent 
random variables is equal to the product of the characteristic functions of the 

individual random variables X„ i = 1,2, n, it follows that, in the transform 

domain, the pdf of Y is the n-fold convolution of the pdfs of the X,. Usually 
the n-fold convolution is more difficult to perform than the characteristic 
function method described above in determining the pdf of Y. 

When working with n -dimensional random variables, it is appropriate to 
define and n-dimensional Fourier transform of the joint pdf. In particular, if 
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X it 1 = 1,2, are random variables with pdf p(x x , x 2 , . . ,x„), the 

n -dimensional characteristic function is defined as 

<P(j v i - • • • - fr>n) 

= £^exp (/ 2 V 'X’)} 

= J ■■ ■ J exp (/ 2 vptj ^p(x u x 2 , ...,x n )dx^dx 2 -dx n (2-1-81) 
Of special interest is the two-dimensional characteristic function 

*(jvi,jv 2 )=[ [ e /(v ' Xl+Vl ** > p(x t , x 2 )dXi dx 2 (2-1-82) 


We observe that the partial derivatives of ip(jvi,jv 2 ) with respect to Vj and v 2 
can be used to generate the joint moments. For example, it is easy to show that 


£(*1*2) = “ 


d 2 Hi v \>i v *) 


dti] dv 2 


(2-1-83) 


V, = LM = Cl 


Higher-order moments are generated in a straightforward manner. 


2-1-4 Some Useful Probability Distributions 

In subsequent chapters, we shall encounter several different types of random 
variables. In this section we list these frequently encountered random 
variables, their pdfs, their cdfs, and their moments. We begin with the binomial 
distribution, which is the distribution of a discrete random variable, and then 
we present the distributions of several continuous random variables. 

Binomial Distribution Let A 1 be a discrete random variable that has two 
possible values, say W-l or X = 0, with probabilities p and 1 —p, 
respectively. The pdf of X is shown in Fig. 2-1-6. Now, suppose that 

r-jU 

1 = 1 

where the X h 1 = 1,2 n, are statistically independent and identically 


1 - P 


p 


FIGURE 2-1-6 The probability distribution function of X. 
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x 
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distributed random variables with the pdf shown in Fig. 2-1-6. What is the 
probability distribution function of V? 

To answer this question, we observe that the range of Y is the set of 
integers from 0 to n. The probability that Y = 0 is simply the probability that 
all the X, — 0. Since the X, are statistically independent, 

P{Y = 0 ) = (1 - p) n 

The probability that Y = 1 is simply the probability that one X , = 1 and the rest 
of the X, = 0. Since this event can occur in n different ways. 


- 1 


P{Y = \) = np{\-p) 

To generalize, the probability that Y = k is the probability that k of the X, are 
equal to one and n - k are equal to zero. Since there are 


\kJ k\(n- 


(2-1-84) 


different combinations that result in the event {Y = k}, it follows that 

P{Y = k) = ^p k {\-p) n k (2-1-85) 

where ^ ^ is the binomial coefficient. Consequently, the pdf of Y may be 
expressed as 


P(y)= £ P(Y — k) 6(y — k) 


k- 0 


= i o (”)p*d-pr-*5(y-fc) 


( 2 - 1 - 86 ) 


The cdf of y is 


F(y) = P(Y*y) 
(>1 




(2-1-87) 


where [>i] denotes the largest integer m such that The cdf in (2-1-87) 

characterizes a binomially distributed random variable. 

The first two moments of Y are 


E(Y) = np 

E(Y 2 ) = np(l-p)+n 2 p 2 
(r 2 = np(\ -p) 

and the characteristic function is 

ip(jv) = (1 -p +pe jv ) n 


( 2 - 1 - 88 ) 


(2-1-89) 
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FIGURE 2-1-7 The pdf and cdf of a uniformly distributed random variable. 
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FIGURE 2-1-8 


rui 



pU! 



The pdf and cdf of a gaussian-distributed fandom variable. 


where erf ( x ) denotes the error function, defined as 



(2-1-94) 


The pdf and cdf are illustrated in Fig. 2-1-8. 

The cdfF(jc) may also be expressed in terms of the complementary error 
function. That is, 

fW = l-Serfc(^) 

where 

ertcW= tll e ~"' d ‘ 


= 1 - erf (jr) 


(2-1-95) 


We note that erf (— x) = -erf (x), erfc (— x) = 2 — erfc (x), erf (0) — erfc («) = 
0, and erf (“) = erfc (0) = 1. For x >m x , the complementary error function is 
proportional to the area under the tail of the gaussian pdf. For large values of 
x, the complementary error function erfc(x) may be approximated by the 
asymptotic series 


erfc (x) = 




1-3 

2 z x 4 


1-3-5 
2 V 



(2-1-96) 


where the approximation error is less than the last term used. 

The function that is frequently used for the area under the tail of the 
gaussian pdf is denoted by Q(x) and defined as 

e~ 0a dt, x^O (2-1-97) 

By comparing (2-1-95) with (2-1-97), we find 

{?(*) = * erfc 


(2-1-98) 
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The characteristic function of a gaussian random variable with mean m x and 
variance <r is 


<,'/{» ' 


f e' vx 

1 . g -(» -rn>)-/2ir- 

j , 

■ VZn cr 


dx 




( 2 - 1 - 99 ) 


The central moments of a gar .sian random variable are 


E[(X-m x )*]-tt k 


1 • 3 • ■ - (A - 1 )a* (even k ) 
0 (odd k) 


(2-1-100) 


and the ordinary moments may be expressed in terms of the central moments 
as 


£(**)= 

1=0 ' 1 ' 


( 2 - 1 - 101 ) 


The sum of n statistically independent gaussian random variables is also a 
gaussian random variable. To demonstrate this point, let 


Y=^X, ( 2 - 1 - 102 ) 

i- 1 

where the X h * =1.2 n , are statistically independent gaussian random 

variables with means m, and variances of. Using the result in (2-1-79), we find 
that the characteristic function of Y is 


n 

<M/W = [I <A x.ijv) 

f = 1 

n 

_ J~J gJ vm t ~ v 2 oy2 
r= 1 

_ -v-0~/2 

where 

1= 1 


n 



i— 1 


( 2 - 1 - 103 ) 


( 2 - 1 - 104 ) 


Therefore, Y is gaussian-distributed with mean m y and variance cr\. 


Chi-Square Distribution A chi-square -distributed random variable is re- 
lated to a gaussian-distributed random variable in the sense that the former can 
be viewed as a transformation of the latter. To be specific, let Y = X 2 , where X 
is a gaussian random variable. Then Y has a chi-square distribution. We 
distinguish between two types of chi-square distributions. The first is called a 
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centra! chi-square distribution and is obtained when X has zero mean. The 
second is called a non-central chi-square distribution, and is obtained when X 
has a nonzero mean. 

First we consider the central chi-square distribution. Let X be gaussian- 
distributed with zero mean and variance o 2 . Since Y -■ X 2 , the result given in 
(2-1-47) applies directly with a = 1 dnd b~ 0. Thus we obtain the pdf of Y in 
the form 

Py{y) = V ^' o - e /2 "’ - V55 ° (2-1-105) 

The cdf of Y is 

Fyiv) = [ p Y (u)du 


= -=L~ [ ~ e " l2,jZ du 
V 2/r (7 Jo Vu 


(2-1-106) 


which cannot be expressed in closed form. The characteristic function, 
however, can be determined in closed form. It is 


ip(/v) = 


1 

(1 -j2v<r 2 )' 12 


Now, suppose that the random variable Y is defined as 


(2-1-107) 


n 

(2-1-108) 

/ = 1 

where the X„ i = 1 , 2 n, are statistically independent and identically 

distributed gaussian random variables with zero mean and variance <r 2 . As a 
consequence of the statistical independence of the X„ the characteristic 
function of Y is 


4/ y (jv) = 


1 


(1 - j2vo 2 )" 12 

The inverse transform of this characteristic function yields the pdf 

1 




v *n- i p -y/2<r : 


cr"2' T(fn) 

where T(p) is the gamma function, defined as 


y >0 


(2-1-109) 


( 2 - 1 - 110 ) 


T(/>) = t p 'e ' df, p> 0 
Jo 

r(p) = (p - 1)!, p an iriteger , p > 0 (2 1-111) 

r<i) = v*. r($) = *v£ 

This pdf, which is a generalization of (2-1-105), is called a chi-square (or 
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FIGURE 2-1-9 


The pdf of a chi-square-distributed random 
variable for several degrees of freedom. 


p(y) 



gamma) pdf with n degrees of freedom. It is illustrated in Fig. 2-1-9. The case 
n = 2 yields the exponential distribution. 

The first two moments of Y are 

E{Y) = no 2 

E(Y 2 ) = 2na 4 + n 2 a 4 (2-1-112) 

(t 2 = 2n<r 4 

The cdf of Y is 


This integral can be easily manipulated into the form of the incomplete gamma 
function, which is tabulated by Pearson (1965). When n is even, the integral in 
(2-1-113) can be expressed in closed form. Specifically, let m = 2 n, where m is 
an integer. Then, by repeated integration by parts, we obtain 

My) . | y » 0 (2-1-1 14) 

Let us now consider a noncentral chi-square distribution, which results from 
squaring a gaussian random variable having a nonzero mean. If X is gaussian 
with mean m x and variance a 2 , the random variable Y = X 2 has the pdf 

Priy)= ^ e " y "' ! ^ ccsh (^)’ > B0 < 2 - 1 - 115 ' 

which is obtained by applying the result in (2-1-47) to the gaussian pdf given by 
(2-1-92). The characteristic function corresponding to this pdf is 

— (2-1-116) 
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To generalize these results, let Y be the sum of squares of gaussian random 
variables as defined by (2-1-108). The X„ i = 1, 2, . . . , n, are assumed to be 
statistically independent with means m,, i = 1 , 2, . . . , n, and identical variances 
equal to a 2 . Then the characteristic function of Y , obtained from (2-1-116) by 
applying the relation in (2-1-79), is 


'l'y(jv) 


1 


(1 - j2v<T~) 


2\n/2 


exp 


jv £ m 2 i 

1 = 1 

,1 - jlvcr 2 


(2-1-117) 


This characteristic function can be inverse-Fourier-transformed to yield the pdf 
PAy) = To 2 ) e - ( ^ + .v)^ /n/2 y^o (2-1-118) 


where, by definition, 



(2-1-119) 


and /„( x) is the ath-order modified Bessel function of the first kind, which may 
be represented by the infinite series 


4,00 = 


y (*/2) a+2 * 

*tU!r(a+A + l)’ 


x >0 


( 2 - 1 - 120 ) 


The pdf given by (2-1-118) is called the noncentral chi-square pdf with n 
degrees of freedom. The parameter r 2 is called the noncentrality parameter of 
the distribution. 

The cdf of the noncentral chi square with n degrees of freedom is 


F Y (y) = 


fMT" 

Jo 2a \s J 


~(s* + u)i2<j l i 

l nf 2- 



( 2 - 1 - 121 ) 


There is no closed-form expression, for this integral. However, when m = \n is 
an integer, the cdf can be expressed in terms of the generalized Marcum’s Q 
function, which is defined as 


Q m (a,b)=j jr(-) e- u ^ a2)a l m _ l (ax)dx 

Jb 

= Q t (a,b) + e (a2+l,2)/2 2 (-)/*(«*) (2-1-122) 

*=i '■a/ 

where 

Qx(a,b) = e- ia2+b2 > a 2 Q (l) I k (al>), b>a> 0 (2-1-123) 

If we change the variable of integration in (2-1-121) from u to x, where 

x 2 = u/o 2 
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and let a 2 = s 2 /a 2 , then it is easily shown that 

( s Vv\ 

Fy(y)=l-Qj-,—} (2-1-124) 

\<T <7 / 

Finally, we state that the first two moments of a noncentral chi-square- 
distributed random variable are 

E(Y) = ncr 2 + s 2 

E(Y 2 ) = 2na* + 4 a 2 s 2 + (no- 2 + s 2 ) 2 (2-1-125 ) 

a 2 = 2ncr* + 4cr 2 s 2 


Rayleigh Distribution The Rayleigh distribution is frequently used to 
model the statistics of signals transmitted through radio channels such as 
cellular radio. This distribution is closely related to the central chi-square 
distribution. To illustrate this point, let Y = X\ + X\ where X x and X 2 are 
zero-mean statistically independent gaussian random variables, each having a 
variance a 2 . From the discussion above, it follows that Y is chi-square- 
distributed with two degrees of freedom. Hence, the pdf of Y is 

py(y) = ^ e ~ y/2cr \ y^O (2-1-126) 

Now, suppose we define a new random variable 

R=VX 2 + X 2 2 = VY (2-1-127) 

Making a simple change of variable in the pdf of (2-1-126), we obtain the pdf 
of R in the form 


pR(r) = ^e r> 0 (2-1-128) 

< 7 

This is the pdf of a Rayleigh-distributed random variable. The corresponding 
cdf is 



= \-e r7!2,T \ 0 (2-1-129) 

The moments of R are 

E(R k ) - (2o- 2 )* /2 F(1 + \k) (2-1-130) 


» 

5 
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and the variance is 

<rj = (2 - 1 2 k)<t 2 (2-1-131) 

The characteristic function of the Rayleigh-distributed, random variable is 

Mv)= l 

Jo o‘ 

This integral may be expressed as 


(2-1-132) 


Mv) = I — 2 

J[) rr 


e r2l2 ° 2 cos vr dr + j 


i 

A) rr 


sin vr dr 


= ifi(l , l 2 :-b 2 <r 2 ) + j'/frv<T 2 e M (2-1-133) 

where i; —a) is the confluent hypergeometric function, which is defined 


as 


^ r(a + k)r(p)x k 

(2-1-134) 


Beaulieu (1990) has shown that ,F,(1, —a) may be expressed as 


\F l (1 . I' a) — 




(2-1-135) 


(To(2fc — l)Ar! 

As a generalization of the above expression, consider the random variable 


R-yj Ixf 


(2-1-136) 


where the X h i = 1, 2, . . . , n, are statistically independent, identically distrib- 
uted zero mean gaussian random variables. The random variable R has a 
generalized Rayleigh distribution. Clearly, Y - R 2 is chi-square-distributed 
with n degrees of freedom. Its pdf is given by (2-1-110). A simple change in 
variable in (2-1-110) yields the pdf of R in the form 


- 1 


PR{r) = 


2 <"~ 2)/ 2<rT0rt) 


. r 2 ! 2 a : 


r 5*0 


(2-1-137) 


As a consequence of the functional relationship between the central 
chi-square and the Rayleigh distributions, the corresponding cdfs are similar. 
Thus, for any n, the cdf of R can be put in the form of the incomplete gamma 
function. In the special case when n is even, i.e.,' n = 2m, the cdf of R can be 
expressed in the closed form 


F R (r) = l- e 


Y l ±/J±) k 

*=0 *! '2<r / ’ 


r 5=0 


(2-1-138) 
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Finally, we state that the Jcth moment of R is 

£(/?*) ~ (2cr 2 ) l,a 0 (2-1-1-39) 

F(i«) 

which holds for any integer n. 


Rice Distribution Just as the s Rayleigh distribution is related to the central 
chi-square distribution, the Rice distribution is related to the noncentral 
chi-square distribution. To illustrate this relation, let Y = X\ + X\, where X, 
and X 2 are statistically independent gaussian random variables with means m„ 
i = 1,2, and common variance a 2 . From the previous discussion, we know that 
Y has a noncentral chi-square distribution with noncentrality parameter 
s z = m] + ml The pdf of Y, obtained from (2-1-118) for n = 2, is 

Pv(y) = ^e" (lJ+v,/2ffJ / 0 (Vy^), y^O (2-1-140) 

Now, we define a new random variable R = Vy. The pdf of R, obtained 
from (2-1-140) by a simple change of variable, is 

PR(r) = ^i e ^ +iW / 0 (^), 0 (2-1-141) 

This is the pdf of a Ricean-distributed random variable. As will be shown in 
Chapter 5, this pdf characterizes the statistics of the envelope of a signal 
corrupted by additive narrowband gaussian noise. It is also used to model the 
signal statistics of signals transmitted through some radio channels. The cdf of 
R is easily obtained by specializing the results in (2-1-124) to the case m = 1. 
This yields 

r^O (2-1-142) 

where Q t (a, b) is defined by (2-1-123). 

As a generalization of the expressions given above, let R be defined as in 
(2-1-136) where the X h i~\,2,...,n are statistically independent gaussian 
random variables with means m it i = 1, 2, . . . , n, and identical variances equal 
to a 2 . The random variable R 2 = Y has a noncentral chi-square distribution 
with n degrees of freedom and noncentrality parameters' 2 given by (2-1-119). 
Its pdf is given by (2-1-118). Hence the pdf of R is 

P«(') = "^,,-iya e r^O (2-1-143) 



48 DIGITAL COMMUNICATIONS 


and the corresponding cdf is 

F R (r) = P(R € r) = r) = /*( V r 2 ) = F y (r 2 ) (2-1-144) 

where F Y (r 2 ) is given by (2-1-121). In the special case where m — is an 
integer, we have 

= l ~Qn,( L , L \ 0 (2-1-145) 

which follows from (2-1-124). Finally, we state that the k th moment of R is 


(2-1-146) 


where , F t (a, fi;x) is the confluent hypergeometric function. 


Nakagami m -Distribution Both the Rayleigh distribution and the Rice 
distribution are frequently used to describe the statistical fluctuations of signals 
received from a multipath fading channel. These channel models are con- 
sidered in Chapter 14. Another distribution that is frequently used to 
characterize the statistics of signals transmitted through multipath fading 
channels is the Nakagami m -distribution. The pdf for this distribution is given 
by Nakagami (1960) as 


Pit{r) = 


^ I 171 ') -.2m-\ -mrVQ 


(2-1-147) 


where ft is defined as 


a=E(R 2 ) (2-1-148) 

and the parameter m is defined as the ratio of moments, called the fading 
figure, 


ft 2 

£[(/? 2 -ft) 2 ]’ 


it 


m 




(2-1-149) 


A normalized version of (2-1-147) may be obtained by defining another 
random variable X - /?/Vft (see Problem 2-15). The nth moment of R is 


E{R n ) 


l (m + ln) /fl \"' 2 
r(m) \m) 


By setting m = 1, we observe that (2-1-147) reduces to a Rayleigh pdf. For 
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FIGURE 2-1-lF The to - distributed pdf, shown with 
G = I. to is the fading figure. 

( Miyagaki et al. 1978.) 



values of m in the range \ =£ m =£ 1, we obtain pdfs that have larger tails than a 
Rayleigh-distributed random variable. For values of m > 1, the tail of the pdf 
decays faster than that of the Rayleigh. Figure 2-1-10 illustrates the pdfs for 
different values of m. 


Multivariate Gaussian Distribution Of the many multivariate or multi- 
dimensional distributions that can be defined, the multivariate gaussian 
distribution is the most important and the one most likely to be encountered in 
practice. We shall briefly introduce this distribution and state its basic 
properties. 

Let us assume that X„ * = 1,2 n, are gaussian random variables with 

means m h i = 1,2,..., n, variances trj, i = 1,2, .... n, and covariances 
i,j = 1,2, ... ,rt. Clearly, fi i( = a], i = 1, 2 , . . . , n. Let M denote the nXn 
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covariance matrix with elements {p.,,}, let X denote the n x 1 column vector of 
random variables, and let m x denote the n x 1 column vector of mean values 
m,, i~\,2,...,n. The joint pdf of the gaussian random variables X„ 
i = 1. 2, . . . , n, is defined as 


p{x l .x 2 x„) — ( 2^)"' 2 (det!VI) ,7£eXP " m .v)'M '(* - "».,)! 

(2-1-150) 

where M 1 denotes the inverse of M and x' denotes the transpose of x. 

The characteristic function corresponding to this n -dimensional joint pdf is 

HM = E (e' v ’) 

where v is an n-dimensional vector with elements v„ i = 1, 2, . . , n. 
Evaluation of this n-dimensional Fourier transform yields the result 

<^(/v) = exp (ym> - jr'lMv) (2-1-151) 

An important special case of (2-1-150) is the bivariate or two-dimensional 
gaussian pdf. The mean m x and the covariance matrix M for this case are 


nit 


-m 2 . 


M 


-P-\2 &l- 


where the joint central moment p ]2 is defined as 


M12 = £[(*i —"!,)( X 2 - m 2 ) J 
It is convenient to define a normalized covariance 


(2-1-152) 


Pit ~ ~~ L , i*j (2-1-153) 

where p, v satisfies the condition 0 =£ |p (/ -| ^ 1. When dealing with the two- 
dimensional case, it is customary to drop the subscripts on p 12 and p 12 . Hence 
the covariance matrix is expressed as 


Its inverse is 



Lpo-, 0 - 2 


P<r 102 
a\ J 


M 


-l _ . 


1 


1(1 ~P 2 ) 


o’! 


pu i a 2 

<A 


(2-1-154) 


(2-1-155) 


L -P&\ o-2 
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and det M = crjcr|( 1 - p 2 ). Substitution for M 1 into (2-1-150) yields the 
desired bivariate gaussian pdf in the form 


p(x i, * 2 ) = 


2rr<r l cr ; V 1 - 

<r 2 2 (x t -m,) 2 - 2pa { a 2 (x 1 - m,)(x 2 - m 2 ) + a](x 2 - m 2 ) 2 ] 


x exp 




2aW 2 2 (l-p 2 ) 


(2-1-156) 


We note that when p =0, the joint pdf p(x u x 2 ) in (2-1-156) factors into the 
product p(xt)p(x 2 ), where p(x t ), i = 1, 2, are the marginal pdfs. Since p is a 
measure of the correlation between A", and X 2 , we have shown that when the 
gaussian random variables X , and X 2 are uncorrelated, they are also 
statistically independent. This is an important property of gaussian random 
variables, which does not hold in general for other distributions. It extends to 
n -dimensional gaussian random variables in a straightforward manner. That is, 

if p^ = 0 for i p 4 j then the random variables X h i = 1 , 2 n are uncorrelated 

and, hence, statistically independent. 

Now, let us consider a linear transformation of n gaussian random variables 
X it i ~ 1,2, , n, with mean vector m* and covariance matrix M. Let 


Y = AX (2-1-157) 

where A is a nonsingular matrix. As shown previously, the jacobian of this 
transformation is / = l/det A. Since X = A 'Y, we may substitute for X in 
(2-1-150) and, thus, we obtain the joint pdf of Y in the form 

p(y> exp ~ ■*)'”"(*"? - ">-)] 

1 

= (Tffy-^'det oP exp ~ (y - ■».-)] (2-1-158) 

where the vector m, and the matrix Q are defined as 


m v = Am, 
Q = AMA 


(2-1-159) 


Thus we have shown that a linear transformation of a set of jointly gaussian 
random variables results in another set of jointly gaussian random variables. 

Suppose that we wish to perform a linear transformation that results in n 
statistically independent gaussian random variables. How should the matrix A 
be selected? From our previous discussion, we know that the gaussian random 
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variables are statistically independent if they are pairwise-uncorrelated, i.e., if 
the covariance matrix Q is diagonal. Therefore, we must have 

AMA' = D (2-1-160) 

where D is a diagonal matrix. The matrix M is a covariance matrix; hence, it is 
positive definite. One solution is to select A to be an orthogonal matrix 
(A' = A -1 ) consisting of columns that are the eigenvectors of the covariance 
matrix M. Then D is a diagonal matrix with diagonal elements equal to the 
eigenvalues of M. 


Example 2*1-5 

Consider the bivariate gaussian pdf with covariance matrix 



Let us determine the transformation A that will result in uncorrelated 
random variables. First, we solve for the eigenvalues of M. The characteris- 
tic equation is 

det (M - AI) = 0 
(l-A) 2 -i = 0 

A-J.4 


Next we determine the two eigenvectors. If a denotes an eigenvector, we 
have 

(M - Al)a = 0 


With A, = \ and A 2 = we obtain the eigenvectors 



Therefore, 



It is easily verified that A 1 = A' and that 


AMA' = D 

where the diagonal elements of D are \ and 
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2-1-5 Upper Bounds on the Tall Probability 

In evaluating the performance of a digital communication system, it is often 
necessary to determine the area under the tail of the pdf. We refer to this area 
as the tail probability. In this section, we present two upper bounds on the tail 
probability. The first, obtained from the Chebyshev inequality, is rather loose. 
The second, called the Chernoff bound, is much tighter. 


Chebyshev Inequality Suppose that X is an arbitrary random variable with 
finite mean m x and finite variance a 2 . For any positive number S, 


P(IX-m x l>8)* 


8 2 


( 2 - 1 - 161 ) 


This relation is called the Chebyshev inequality. The proof of this bound is 
relatively simple. We have 

\ 

[ (x ~ m x ) 2 p(x) dx^\ (x - m x ) 2 p{x) dx 

J\jc-nt x \7*8 

=* & 2 \ p(i)dx = 8 2 P{\X -m x \^8) 

Thus the validity of the inequality is established. 

It is apparent that the Chebyshev inequality is simply an upper bound on 
the area under the tails of the pdf p{y), where Y = X - m x , i.e., the area of 
p(y) in the intervals (-«>, -8) and (5, oc). Hence, the Chebyshev inequality 
may be expressed as 

1 -[F K (5)-F y (-S)]«^| (2-1-162) 

or, equivalently, as 

1 ~ + 5) - F x (m x - 8)] (2-1-163) 

O 


There is another way to view the Chebyshev bound. Working with the zero 
mean random variable Y = X - m x , for convenience, suppose we define a 
function g(T) as 

(I rl<s! .<*-»«> 

Since g(Y) is either 0 or 1 with probabilities F(|y|<8) and P{\Y\^8), 
respectively, its mean value is 


£[*(l')]«/»(in»S) 


(2-1-165) 
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FIGURE 2-1-11 


A quadratic upper bound on g( Y) used in 
obtaining the tail probability (Chebyshev 
bound). 



Now suppose that we upper-bound g(T) by the quadratic (Y!8) 2 , i.e.. 



(2-1-166) 


The graph of g(Y) and the upper bound are shown in Fig. 2-1-11. It follows 
that 




E(Y 2 )_o*_oj 

s 2 s 2 s 2 


Since E[g{Y)\ is the tail probability, as seen from (2-1-165), we have obtained 
the Chebyshev bound. 

For many practical applications, the Chebyshev bound is extremely loose. 
The reason for this may be attributed to the looseness of the quadratic ( Y/S ) 2 
in overbounding g(Y). There are certainly many other functions that can be 
used to overbound g(Y). Below, we use an exponential bound to derive an 
upper bound on the tail probability that is extremely tight. 


Ctternoff Bound The Chebyshev bound given above involves the area 
under the two tails of the pdf. In some applications we are interested only in 
the area under one tail, either in the interval (S, *) or in the interval (-*, 8). 
In such a case we can obtain an extremely tight upper bound by overbounding 
the function g(Y) by an exponential having a parameter that can be optimized 
to yield as tight an upper bound as possible. Specifically, we consider the tail 
probability in the interval (S, *>). The function g(Y) is overbounded as 

g(Y)^e v{Y ~ S) (2-1-167) 

where g(Y) is now defined as 

, v , ri (Y»s) 

g(V) 'lo (K<S) < 2 - M68 > 
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FIGURE 2*1-12 


An exponential upper bound on g( Y ) used in 
obtaining the tail probability (Chemoff bound). 



and v 5*0 is the parameter to be optimized. The graph of g(K) and the 
exponential upper bound are shown in Fig. 2-1-12. 

The expected value of g{Y) is 

E [g(Y)] = P(Y> 8) € E(e viY ~ s ') (2-1-169) 

This bound is valid for any vs*0. The tightest upper bound is obtained by 
selecting the value of v that minimizes E(e v(Y ~ S) ). A necessary condition for a 
minimum is 

~E(e v(Y - S) ) = 0 (2-1-170) 

But the order of differentiation and expectation can be interchanged, so that 

-^£(e v < y - s >)=£(-f e v < y - S) ) 

dv \dv ! 

~ E[(Y — 5)e v(>,-fi) ] 

= e~' 6 [E(Ye vY ) - 5£(e vK )j = 0 

Therefore the value of v that gives the tightest upper bound is the solution to 
the equation 

E(Ye vy )-8E(e vy ) = 0 (2-1-171) 

Let 0 be the solution of (2-1-171). Then, from (2-1-169), the upper bound on 
the one-sided tail probability is 

P(Y>8)^e-* s E(e* v ) (2-1-172) 

This is the Chernoff bound for the upper tail probability for a discrete or a 
continuous random variable having a zero mean.t This bound may be used to 
show that Q(x) e * a , where Q(x) is the area in the tail of the gaussian pdf 
(see Problem 2-18). 


t Note that £(e vK ) for real v is not the characteristic function of Y. It is called the moment 
generating function of Y. 
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FIGURE 2-1-13 The pdf of a Laplace-distributed random variable. 


piy) 



An upper bound on the tower tail probability can be obtained in a similar 
manner, with the result that 

P(Y^8)^e * s E(e ty ) (2-1-173) 

where v is the solution to (2-1-171) and 5 <0. 


Example 2-1-6 

Consider the (Laplace) pdf 

p(y) = ie-' yi (2-1-174) 

which is illustrated in Fig. 2-1-13. Let us evaluate the upper tail probability 
from the Chernoff bound and compare it with the true tail probability, 
which is 


P{Y^8)=\ \e?dy = \e- i (2-1-175) 

To solve (2-1-171) for 0, we must determine the moments E(Ye vY ) and 
E{e vY ). For the pdf in (2-1-174), we find that 


E(Ye yY ) 

E(e yY ) 


2v 

(v + l) 2 (v-l) 2 
1 

(1 + v)(l - v) 


(2-1-176) 


Substituting these moments into (2-1-171), we obtain the quadratic equation 

v 2 S +2v - S =0 


which has the solutions 

-1 ± Vl + 5 2 

0 = 

8 


(2-1-177) 


Since f> must be positive, one of the two solutions is discarded. Thus 

-1 + Vl + 8 i 
f> = 


5 


(2-1-178) 
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Finally, we evaluate the upper bound in (2-1-172) by eliminating E(e 9Y ) 
using the second relation in (2-1-176) and by substituting for f> from 
(2-1-178). The result is 

fi2 

P(Y^8)^ (2-1-179) 

2(-i + VTT?) v ' 

For 8 » 1, (2-1-179) reduces to 

P(Y^8)^^e~ 8 (2-1-180) 

We note that the Chernoff bound decreases exponentially as 5 increases. 
Consequently, it approximates closely the exact tail probability given by 
(2-1-175). In contrast, the Chebyshev upper bound for the upper tail 
probability obtained by taking one-half of the probability in the two tails (due 
to symmetry in the pdf) is 

Hence, this bound is extremely loose. 

When the random variable has a nonzero mean, the Chernoff bound can be 
extended as we now demonstrate. If Y = X - m x , we have 

P(Y > 8) = P(X - m x > 8) = P(X s* m x + 5) = P(X 2= 8„) 

where, by definition, 8 m = m x + S. Since 5 > 0, it follows that 8„ > m x . Let 
g(A") be defined as 

and upper-bounded as 

g(*)«e v( *-*-> (2-1-182) 

From this point, the derivation parallels the steps contained in (2-1-169)- 
(2-1-172). Tbe final result is 

P(X^8 m )^e->*-E(e**) ( 2 - 1 - 183 ) 

where 8 m >m x and is the solution to the equation 

E(Xe' x ) - 8 m E(e vX ) = 0 (2-1-184) 

In a similar manner, we can obtain the Chernoff bound for the lower tail 
probability. For 8 < 0, we have 

P(X -m x **8) = P(X ^m x + 8) = P{X ^ 8 m ) « £( e ^- a ~>) (2-1-185) 

From our previous development, it is apparent that (2-1-185) results in the 
bound 

P(X*8„)*e ’*-E{e* x ) 
where 8 m <m x and 0 is the solution to (2-1-184). 


( 2 - 1 - 186 ) 
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2-1-6 Sums of Random Variables and the Central 
Limit Theorem 

Wc have previously considered the problem of determining the pdf of a sum of 
n statistically independent random variables. In this section, we again consider 
the sum of statistically independent random variables, but our approach is 
different and is independent of the particular pdf of the random variables in 
the sum. To be specific, suppose that X h i = 1, 2, . . . , n, are statistically 
independent and identically distributed random variables, each having a finite 
mean m, and a finite variance a\. Let Y be defined as the normalized sum. 
called the sample mean: 

T = - 2 A', (2-1-187) 

n ,=i 

First we shall determine upper bounds on the tail probabilities of Y and then 
we shall prove a very important theorem regarding the pdf of Y in the limit as 
n — * 

The random variable Y defined in (2-1-187) is frequently encountered in 
estimating the mean of a random variable X from a number of observations X h 
i = 1, 2, . . . , n. In other words, the X it i - 1,2, , n, may be considered as 
independent samples drawn from a distribution F x (x), and Y is the estimate of 
the mean m x . 

The mean of Y is 


E(Y) = m y = -f i E{X,) 


- m 


x 


The variance of Y is 

cr 2 y = E(Y 2 ) - m 2 y = E(Y 2 )- m 2 x 

n i=\ j=l 

= i 2 £(2T?) + ~ i i E(X,)E{Xj) - ml 

n i=i n 

i+j 

= -(o^ + m 2 x ) + ~n(n- 1 )ml~m 2 x 

n 


When Y is viewed as an estimate for the mean m x , we note that its expected 
value is equal to m z and its variance decreases inversely with the number of 
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samples n. As n approaches infinity, the variance <r 2 approaches zero. An 
estimate of a parameter (in this case the mean m x ) that satisfies the conditions 
that its expected value converges to the true value of the parameter and the 
variance converges to zero as n —> °° is said to be a consistent estimate 

The tail probability of the random variable Y can be upper-bounded by use 
of the bounds presented in Section 2-1-5. The Chebyshev inequality applied to 
Y is 


P(\Y 




8) 


• £z 




m. 




(2-1-188) 


In the limit as n— * (2-1-188) becomes 


( 1 " 

- 2! X, - m x 

ft j — l 



(2-1-189) 


Therefore, the probability that the estimate of the mean differs from the true 
mean m x by more than 8 (8 > 0) approaches zero as n approaches infinity. This 
statement is a form of the law of large numbers. Since the upper bound 
converges to zero relatively slowly, i.e., inversely with n, the expression in 
(2-1-188) is called the weak law of large numbers. 

The Chernoff bound applied to the random variable Y yields an exponential 
dependence of n, and thus provides a tighter upper bound on the one-sided tail 
probability. Following the procedure developed in Section 2-1-5, we can 
determine that the tail probability for y is 


P(Y-m v >8) = p(-'Z X j -m x ^8 

\n ; = i 


= p(fx i ^n8 m )^E 

' 1*1 


{exp Xi 


n8„ 


(2-1-190) 


where 8 m - m x + 8 and 8 >0. But the X„ i = 1, 2, . . . ,n, are statistically 
independent and identically distributed. Hence, 





«p( v J>,) 


= e- v "*-n E(e vX <) 

1=1 

= [e~ v8 ”E(e vX )] n (2-1-191) 

where X denotes any one of the X t . The parameter v that yields the tightest 
upper bound is obtained by differentiating (2-1-191) and setting the derivative 
equal to zero. This yields the equation 


E(Xe vX ) - 8 m E(e vX ) = 0 


(2-1-192) 
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Let the solution of (2-1-192) be denoted by Then, the bound on the upper 
tail probability is 

p(-^X^8 m )^\e-^E{e ix ) ]", 8 m >rn x (2-1-193) 

\n , =l / 

In a similar manner, we find that the lower tail probability is upper-bounded as 
P(Y^8 m )^[e-^'E(e 9X )] n , 8 m <m x (2-1-194) 

where 9 is the solution to (2-1-192). 


Example 2-1-7 

Let X it i = 1,2, , n, be a set of statistically independent random variables 
defined as 

_ f * P r °b a bdity P < 2 

l—l with probability 1 - p 

We wish to determine a tight upper bound on the probability that the sum 
of the Xi is greater than zero. Since p <\, we note that the sum will have a 
negative value for the mean; hence we seek the upper tail probability. With 
8 m = 0 in (2-1-193), we have 


’’(i (£(***)]" 


where is the solution to the equation 

E{Xe vX ) = 0 


Now 

Hence 


E(Xe vX ) = -(\- P )e~ v + pe v = 0 


f> = !n 


1 


Furthermore, 

E{e iX ) = pe*+(l-p)e 
Therefore the bound in (2-1-195) becomes 


X, ^ o) [pe* + (1 - p)e*] n 

■[./ 


+ 0 -Ph 


1 -p\ 


;[4p(l - p )] 


m/2 


(2-1-195) 


(2-1-196) 


(2-1-197) 


(2-1-198) 


We observe that the upper bound decays exponentially with n, as expected. 
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In contrast, if the Chebyshev bound were evaluated, the tail probability 
would decrease inversely with n. 


Central Limit Theorem We conclude this section with an extremely useful 
theorem concerning the cdf of a sum of random variables in the limit as the 
number of terms in the sum approaches infinity. There are several versions of 
this theorem. We shall prove the theorem for the case in which the random 
variables X„ i = 1, 2, . . . , n, being summed are statistically independent and 
identically distributed, each having a finite mean m x and a finite variance a\. 
For convenience, we define the normalized random variable 


U,= 


X, - m x 


i = 1 , 2, . . . , n 


Thus U, has a zero mean and unit variance. Now, let 


Y = 


Vn l 


2u, 


(2-1-199) 


Since each term in the sum has a zero mean and unit variance, it follows that 
the normalized (by 1 )Vn) random variable Y has zero mean and unit variance. 
We wish to determine the cdf of Y in the limit as n — > ». 

The characteristic function of Y is 


>Mjv) - E(e vY ) = E 


exp 


jv S Ui 

i= 1 




( 2 - 1 - 200 ) 


where U denotes any of the U h which are identically distributed. Now, let us 
expand the characteristic function of U in a Taylor series. The expansion yields 


4'^) - 1 + 4 £(t/) 4 £(i/2 > - ■ 

Since E(U) = 0 and E(U 2 )=l, (2-1-201) simplifies to 

/ jv \ v 2 1 


( 2 - 1 - 201 ) 


( 2 - 1 - 202 ) 


where R(v,n)/n denotes the remainder. We note that fl(u, n) approaches 



62 DIGITAL COMMUNICATIONS 


zero as n -* ». Substitution of (2-1-202) into (2-1-200) yields the characteristic 
function of Y in the form 


*-«”)-[• -i + ^r (2 -'- 2 ° 3) 

Taking the natural logarithm of (2-1-203), we obtain 

In i/v(A>) = n ln [ 1 + ~ " ] (2-1-204) 

For small values of jc, ln (1 + jc) can be expanded in the power series 

ln (1 + x) ~ x - \x 2 + JU 3 - . . . 

This expansion applied to (2-1-204) yields 

. , . f 1,2 /?(v, n) 1/ v 2 /?(u, n)\ 2 "I 

= i(-Tn + „) + "J <2 -‘- 205> 


Finally, when we take the limit as n -» *, (2-1-205) reduces to 

iim„^x In^y(yV) = -|u 2 , or, equivalently, 

lim &y(j v ) = e~ v!!Z (2-1-206) 


But, this is just the characteristic function of a gaussian random variable with 
zero mean and unit variance. Thus we have the important result that the sum 
of statistically independent and identically distributed random variables with 
finite mean and variance approaches a gaussian cdf as n This result is 
known as the central limit theorem. 

Although we assumed that the random variables in the sum are identically 
distributed, the assumption can be relaxed provided that additional restrictions 
are imposed on the properties of the random variables. There is one variation 
of the theorem, for example, in which the assumption of identically distributed 
random variables is abandoned in favor of a condition on the third absolute 
moment of the random variables in the sum. For a discussion of this and other 
variations of the central limit theorem, the reader is referred to the book by 
Cramer (1946). 


2-2 STOCHASTIC PROCESSES 

Many of the random phenomena that occur in nature are functions of time. 
For example, the meteorological phenomena such as the random fluctuations 
in air temperature and air pressure are functions of time. The thermal noise 
voltages generated in the resistors of an electronic device such as a radio 
receiver are also a function of time. Similarly, the signal at the output of a 
source that generates information is characterized as a random signal that 
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varies with time. An audio signal that is transmitted over a telephone channel 
is an example of such a signal. All these are examples of stochastic (random) 
processes. In our study of digital communications, we encounter stochastic 
processes in the characterization and modeling of signals generated by 
information sources, in the characterization of communication channels used to 
transmit the information, in the characterization of noise generated in a 
receiver, and in the design of the optimum receiver for processing the received 
random signal. 

At any given time instant, the value of a stochastic process, whether it is the 
value of the noise voltage generated by a resistor or the amplitude of the signal 
generated by an audio source, is a random variable. Thus, we may view a 
stochastic process as a random variable indexed by the parameter t. We shall 
denote such a process by X(t). In general, the parameter / is continuous, 
whereas X may be either continuous or discrete, depending on the characteris- 
tics of the source that generates the stochastic process. 

The noise voltage generated by a single resistor or a single information 
source represents a single realization of the stochastic process. Hence, it is 
called a sample function of the stochastic process. The set of all possible sample 
functions, e.g., the set of all noise voltage waveforms generated by resistors, 
constitute an ensemble of sample functions or, equivalently, the stochastic 
process A'(t). In general, the number of sample functions in the ensemble is 
assumed to be extremely large; often it is infinite. 

Having defined a stochastic process X(t) as an ensemble of sample 
functions, we may consider the values of the process at any set of time. instants 
t l >t 2 >h>. . .>t„ where n is any positive integer. In general, the random 
variables X, = X(t,), i = 1, 2, . . . , n, are characterized statistically by their joint 
pdf p(x h , x, 2 , . . . , x, n ). Furthermore, all the probabilistic relations defined in 
Section 2-1 for multidimensional random variables carry over to the random 
variables X t , i- 1,2 n. 

Stationary Stochastic Processes As indicated above, the random variables 
X, t , t = l,2,...,/i, obtained from the stochastic process X(t) for any set of 
time instants t^> t 2 > t 3 > . . . > t n and any n are characterized statistically by 
the joint pdf p(x, t , x, v . . . ,x,f). Let us consider another set of n random 
variables X, i+ , m X(tj + 1), i—l,2,...,n, where t is an arbitrary time shift. 
These random variables are characterized by the joint pdf 
p(.x tl +i, x , 2+t , . . . , x, m+ , ). The joint pdfs of the random variables X, and X,.+„ 

* = 2, - - . , n, may or may not be identical. When they are identical, i.e., 

when 

P( x h> x ‘v ■ ■ • * •*',) ~ P( x if+n x t 2 +t> • ■ • > *!„+() ( 2 - 2 - 1 ) 

for all t and all n, the stochastic process is said to be stationary in the strict 
sense. That is, the statistics of a stationary stochastic process are invariant to 
any translation of the time axis. On the other hand, when the joint pdfs are 
different, the stochastic process is nonstationary. 



64 DIGITAL COMMUNICATIONS 


2*2-1 Statistical Averages 

Just as we have defined statistical averages for random variables, we may 
similarly define statistical averages for a stochastic process. Such averages are 
also called ensemble averages. Let X(t ) denote a random process and let 
X, = A" (r ( ). The nth moment of the random variable X, : is defined as 

E(X7) = \ x-pixjdx,' (2-2-2) 

In general, the value of the nth moment will depend on the time instant t, if the 
pdf of X,. depends on When the process is stationary, however, p(x,+,) = 
p(x,,) for all f. Hence, the pdf is independent of time, and, as a consequence, 
the nth moment is independent of time. 

Next we consider the two random variables X, t ^ X(t,), / — 1,2. The 
correlation between X tl and X h is measured by the joint moment 

E(X t[ X tl )={ f x h x h p(x lx , x,)dx h dx t2 (2-2-3) 

J — 3C •/ — -X. 

Since this joint moment depends on the time instants t t and t 2 , it is denoted by 
0 (f]jf 2 )- The function 4>(t\, t 2 ) is called the autocorrelation function of the 
stochastic process. When the process X(t) is stationary, the joint pdf of the pair 
(X, r X,) is identical to the joint pdf of the pair (X t]+; , X, 2+l ) for any arbitrary t. 
This implies that the autocorrelation function of X(t) does not depend on the 
specific time instants f, and t 2 , but, instead, it depends on the time difference 
fi - t 2 . Thus, for a stationary stochastic process, the joint moment in (2-2-3) is 

E(X,'X, 2 ) = <(>(t u t 2 ) = 4>(t , - 1 2 ) = (2-2-4) 

where T = r, - t 2 or, equivalently, t 2 = r, - r. If we let f 2 = r, + f, we have 

*(~t) = E(X tl X ll + r ) = E{X,-X,._ t ) = d>( t) 

Therefore, <f>(r) is an even function. We also note that <£(0) = E(X?) denotes 
the average power in the process X(t). 

There exist nonstationary processes with the property that the mean value 
of the process is independent of time (a constant) and where the autocorrela- 
tion function satisfies the condition that <f>(t u t 2 ) = 4>(r, - t 2 ). Such a process is 
called wide -sense stationary. Consequently, wide-sense stationary is a less 
stringent condition than strict-sense stationarity. When reference is made to a 
stationary stochastic process in any subsequent discussion in which correlation 
functions are involved, the less stringent condition (wide-sense stationarity) is 
implied. 

Related to the autocorrelation function is the autocovariance function of a 
stochastic process, which is defined as 

p(h,t 2 ) = E{[X t> - - m(f 2 )]} 

= h) ~ m(ti)m(t 2 ) 


(2-2-5) 
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where m(t j) and m(/ 2 ) are the means of X t] and X, 2 , respectively. When the 
process is stationary, the autocovariance function simplifies to 

m('i . h) = ~ ' 2 ) = At(r) = (Hr) ~ m 2 (2-2-6) 

where r = t, - t 2 . 

Higher-order joint moments of two or more random variables derived from 
a stochastic process A"(f) are defined in an obvious manner. With the possible 
exception of the gaussian random process, for which higher-order moments can 
be expressed in terms of first and second moments, high-order moments are 
encountered very infrequently in practice. 

Averages for a Gaussian Process Suppose that X (?) is a gaussian random 
process. Hence, at time instants t = i = 1, 2, . . . , n, the random variables X,., 
i = 1, 2, . . . , n, are jointly gaussian with mean values mfo), i = 1, 2, . . . , n, and 
autocovariances 

H{(„ tj ) - E[(X, t - m (?,))(*,, - m(r,))], i,j = 1,2, ... ,n (2-2-7) 

If we denote the n x n covariance matrix with elements yx(r„ t,) by M and the 
vector of mean values by in,, then the joint pdf of the random variables X, t , 
i = 1,2, ... ,n is given by (2-1-150). 

If the gaussian process is stationary then m(t,) = m for all r, and /u(t„ tj) = 
ti(t, - tj). We observe that the gaussian random process is completely specified 
by the mean and autocovariance functions. Since the joint gaussian pdf 
depends only on these two moments, it follows that if the gaussian process is 
wide-sense stationary, it is also strict-sense stationary. Of course, the converse 
is always true for any stochastic process. 

Averages for Joint Stochastic Processes Let X(t) and K(r) denote two 
stochastic processes and let X,. ^X{t t ), i = 1,2, .... n, and Y, ; ^ Y(?,'), j = 

1, 2 m, represent the random variables at times t x > t 2 >t 3 >. . . >t„ and 

t'i > t' 2 > . . . > t’ m , respectively. The two processes are characterized statisti- 
cally by their joint pdf 

p( x i,> x t 2 > • ■ • > x t„< y,\, y,i> • • • . y,J 

for any set of time instants t x , t 2 , . . . , t„, t[, t 2 , . . . ,t' m and for any positive 
integer values of n and m. 

The cross-correlation function of A^?) and Y(?), denoted by <f> xy (t Xl t 2 ), is 
defined as the joint moment 


4> ty {t u t 2 ) = E(X ll Y l2 ) = 



*'J'2P(*n>y'i)dx t ,dy, 2 


( 2 - 2 - 8 ) 


and the cross -covariance is 


Px y (t lt t 2 ) = <t> iy (t l , t 2 ) - m x (t x )m y (t 2 ) 


(2-2-9) 
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When the processes are jointly and individually stationary, we have 
<t>< y (tut 2 ) = <t>xv(t i ^ f 3 ) and MrvOi. h) = M- vv ('i ~ f 2 )- In this case, we note that 

^rv(-r) = E{X t> >Vr) = E(X,; „'Y i; ) = ^,,(r) (2-2-10) 

The stochastic processes T(r) and T(r) are said to be statistically indepen- 
dent if and only if 

p(x u ,x, : v,„.y, ; , y f ; y, ; J = p(x, y,; y, ; „) 

for all choices of /, and and for all positive integers n and m. The processes 
are said to be unc<>rrelated if 

<t>< y (ti.h) = E(X„)E(Y, : ) 

Hence, 

MrvOl- h) — 0 

A complex-valued stochastic process Z(r ) is defined as 

Z(t)=X(r)+/Y(r) (2-2-11) 

where X(t ) and Y{t) are stochastic processes. The joint pdf of the random 

variables Z , _ = Z(t t ), i~ 1,2 is given by the joint pdf of the components 

[X, : , V,), i - 1, 2, .... n. Thus, the pdf that characterizes Z t , i = l, 2, , n, is 

•*'/;> • ■ • ■ •*!„> y v jv.. • • • j y,„) 

The complex-valued stochastic process Z(r) is encountered in the represen- 
tation of narrowband bandpass noise in terms of its equivalent lowpass 
components. An important characteristic of such a process is its autocorrela- 
tion function. The function is defined as 

ME(Z f1 Z*) 

= ^l(Jf, l +;T, | )(^~;T f2 )] 

= f 2 ) + <M'i- ' 2 ) + h) - <Mr„ t 2 )]} (2-2-12) 

where <t> xx (t u t 2 ) and <0 vv (f , . h) are the autocorrelation functions of A'(r) and 
T(r), respectively, and <f> yx (t u t 2 ) and 4> xy (ti,t 2 ) are the cross-correlation 
functions. The factor of \ in the definition of the autocorrelation function of a 
complex-valued stochastic process is an arbitrary but mathematically con- 
venient normalization factor, as we will demonstrate in our treatment of such 
processes in Chapter 4. 

When the processes X(t) and. Y(t) are jointly and individually stationary, 
the autocorrelation function of Z(f) becomes 

<f>zA l oh) = fizzih - h) = <Mr) 

where t 2 = - r. Also, the complex conjugate of (2-2-12) is 

<t>Ur) = \E(Z*Z t ,. T ) = L 2 E{ZT ;yj Z, s ) = <f> z: (-t) 

Hence, *„(r)= *?.(-■ r). 


(2-2-13) 
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Now, suppose that Z(t) = X(t) + jY(t) and W(t) = U(t) + jV( f) are two 
complex-valued stochastic processes. The cross-correlation function of Z(t) 
and W(f) is defined as 

<t>Ut„t 2 ) = lE(Z ll W*) 

= hE[(X h +jY h )(U, 2 -jV h )} 

= . h) + <M'i , h) + /[<M'| - h) - <KAh , t 2 )]} (2-2-14) 

When X(t), Y(t), U(t), and V(t) are pairwise-stationary, the cross-correlation 
functions in (2-2-14) become functions of the time difference r = /, - f 2 
Furthermore, 

<*>?„( r) = \E{ZrW h - x ) = j£(Z,V r W,.) = T) (2-2-15) 

2-2-2 Power Density Spectrum 

The frequency content of a signal is a very basic characteristic that distin- 
guishes one signal from another. In general, a signal can be classified as having 
either a finite (nonzero) average power (infinite energy) or finite energy. The 
frequency content of a finite energy signal is obtained as the Fourier transform 
of the corresponding time function. If the signal is periodic, its energy is 
infinite and, consequently, its Fourier transform does not exist. The mechanism 
for dealing with periodic signals is to represent them in a Fourier series. With 
such a representation, the Fourier coefficients determine the distribution of 
power at the various discrete frequency components. 

A stationary stochastic process is an infinite energy signal, and, hence, its 
Fourier transform does not exist. The spectral characteristic of a stochastic 
signal is obtained by computing the Fourier transform of the autocorrelation 
function. That is, the distribution of power with frequency is given by the 
function 

<*>(/)=[ <t>(T)e- J2xfT dr (2-2-16) 

The inverse Fourier transform relationship is 

<K r)=f <J>(/y 2,r/r df (2-2-17) 

d — X 

We observe that 

*(«)=[ *(/) df 

A — 

= E( |*, | 2 ) 3=0 (2-2-18) 

Since <f>( 0) represents the average power of the stochastic signal, which is the 
area under <&(/), d>(/) is the distribution of power as a function of frequency. 
Therefore, <!>(/) is called the power density spectrum of the stochastic process. 
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If the stochastic process is real, <£(r) is real and even, and, hence <!>(/) is 
real and even. On the other hand, if the process is complex, <^>( r) = d>*(-r) 
and, hence 


<&*(/)=[ <f>*(r)e /2,tfz dr = f <t>*(-r)e~ j2lcfT dr 

J - x J x 

= j d>(r)e' /2<r/r rfr = <&{f) (2-2-19) 


Therefore, d>(/) is real. 

The definition of a power density spectrum can be extended to two jointly 
stationary stochastic processes X(t) and V(/), which have a cross-correlation 
function d> vv .(r). The Fourier transform of <£,, (r), i.e., 

<M/)=f ^ y (T)e' urr d x (2-2-20) 


is called the cross-power density spectrum. If we conjugate both sides of 
(2-2-20), we have 


*>Uf) = J <(ry 2 ^dr = J <PU-T)e-< ? *''dT 

= f 4>Me ~> 2 * ,z dr = <*>,.,(/) (2-2-21 ) 

* — x 

This relation holds in general. However, if A '(/) and Y{t) are real stochastic 
processes, 

*%(J) = f <M ry 2K!t dr = <1 > vv ( -/) (2-2-22) 

J—-X 

By combining the result in (2-2-21) with the result in (2-2-22), we find that the 
cross-power density spectrum of two real processes satisfies the condition 

<M/) = < M~/) (2-2-23) 


2-2-3 Response of a Linear Time-Invariant System to a 
Random Input Signal 

Consider a linear time-invariant system (filter) that is characterized by its 
impulse response h(t ) or, equivalently, by its frequency response //(/), where 
h(t) and H{f) are a Fourier transform pair. Let x{t ) be the input signal to the 
system and let y(t) denote the output signal. The output of the system may be 
expressed in terms of the convolution integral as 

y(0= [ h(r)x(t-T)dT 

•' — •x 


(2-2-24) 
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Now, suppose that x(f) is a sample function of a stationary stochastic process 
X(t). Then, the output y(t) is a sample function of a stochastic process Y(t). 
We wish to determine the mean and autocorrelation functions of the output. 

Since convolution is a linear operation performed on the input signal x(t), 
the expected value of the integral is equal to the integral of the expected value. 
Thus, the mean value of Y(l) is 


= £[nO]=f h{r)E[X{t-x)\dr 
= m x J* h(z) dr = m x H(fi) 


(2-2-25) 


where //( 0) is the frequency response of the linear system at / = 0. Hence, the 
mean value of the output process is a constant. 

The autocorrelation function of the output is 

4>yy(tuh)=\E(Y h Y% 

= \ \ _ [~h(f})h*(a)E[X( ti - P)X*(t 2 - a)) da dp 



h(0yi*(ar)<M'i 


- t 2 + a — p) da dp 


The last step indicates that the double integral is a function of the time 
difference — 1 2 . In other words, if the input process is stationary, the output is 
also stationary. Hence 

<M T )=f f h*(a)h{p)4> xx {r + a-p)dadp (2-2-26) 

J — 00 J — 9C 

By evaluating the Fourier transform of both sides of (2-2-26), we obtain the 
power density spectrum of the output process in the form 


*„(/)« f <t>yy(T)e-»*dT 

= r r r h*(a)h(P)<l> xx (r + a-p)e~ /2 ’ fT drdadp 

•!-» ■*-* J—ao 


= <M/) \H(f)\ 2 


(2-2-27) 


Thus, we have the important result that the power density spectrum of the 
output signal is the product of the power density spectrum of the input 
multiplied by the magnitude squared of the frequency response of the system. 
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When the autocorrelation function <f» yv (r) is desired, it is usually easier to 
determine the power density spectrum d> vy (/) and then to compute the inverse 
transform. Thus, we have 

<t>yy(T)= f <M 
J — OO 

= f <*>,,(/) I H(f)\ 2 e** fr df (2-2-28) 

-03 

We observe that the average power in the output signal is 

<MO)=f *>**(/) W)l 2 # (2-2-29) 

J — oo 


Since <f> yy ( 0) = £"() V^[ 2 ), it follows that 

[ <M/) \H(f)\ 2 df>0 

oc 

Suppose we let \H (f)\ 2 = 1 for any arbitrarily small interval and 

//(/) = 0 outside this interval. Then, 

r*-o)4f»o 

•Vi 

But this is possible if and only if <!>,*(/) 3= 0 for all /. 


Example 2*2-1 

Suppose that the lowpass filter illustrated in Fig. 2-2-1 is excited by a 
stochastic process x(r) having a power density spectrum 

*«(/)» iM 0 for all/ 

A stochastic process having a flat power density spectrum is called white 



FIGURE 2-2-1 An example of a lowpass filter. 
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FIGURE 2-2-2 


FIGURE 2-2-3 


The power density spectrum of the lowpass filter output when 
the input is while noise. 





noise. Let us determine the power density spectrum of the output process. 
The transfer function of the lowpass filter is 


and, hence. 


"(/) = 


R 

R + j2nfL 


1_ 

1 + j2nfL/R 


\H(f)\ 2 = 


I 

1 +(27TL/R) 2 f 2 


The power density spectrum of the output process is 


<*Vv(/) = 


I 

2 1 + (2kL! R) 2 f 2 


(2-2-30) 


(2-2-31) 


This power density spectrum is illustrated in Fig. 2-2-2. Its inverse Fourier 
transform yields the autocorrelation function 


<Mr) 


" K \ 
,21+ (2 JcL/R) 2 f 2 


e j2 *' T df 


RK 

4 L 


e 


(mi.) in 


(2-2-32) 


The autocorrelation function </> >v (r) is shown in Fig. 2-2-3. We observe that 
the second moment of the process Y(t) is <^ vv (0) = RNJ4L. 


The autocorrelation function of the output of the lowpass filter 
for a white-noise input. 
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As a final exercise, we determine the cross-correlation function between 
y(t) and x(t), where jc(r) denotes the input and y(/) denotes the output of the 
linear system. We have 


= f h(a)E[X(r ] ~a)X*(t i )]da 

t- J co 

= f h{a)<f> xx (t x - f : - a) da = d> yA (f, - t 2 ) 


Hence, the stochastic processes X(t) and Y(t) are jointly stationary. With 
t\~ t 2 ~ r, we have 

<£>-*U) = J h{a)<t> xx (t~ a) (2-2-33) 


Note that the integral in (2-2-33) is a convolution integral. Hence in the 
frequency domain the relation (2-2-33) becomes 


*«(/) = <M/)W) (2-2-34) 

We observe that if the input process is white noise, the cross correlation of the 
input with the output of the system yields the impulse response h(t) to within a 
scale factor. 


2-2-4 Sampling Theorem for Band-Limited 
Stochastic Processes 

Recall that a deterministic signal s(t) that has a Fourier transform S(f) is 
called band-limited if S(f) = 0 for |/| > W, where W is the highest frequency 
contained ins(r). Such a signal is uniquely represented by samples of s(r) taken 
at a rate of f s s* 2W samples/s. The minimum rate / v = 2W samples/s is called 
the Nyquist rate. Sampling below the Nyquist rate results in frequency aliasing. 

The band-limited signal sampled at the Nyquist rate can be reconstructed 
from its samples by use of the interpolation formula 



(2-2-35) 


where (s(n/2W)} are the samples of s(r) taken at t = n/2W, n = 0, ±1, ±2 

Equivalently, s(t) can be reconstructed by passing the sampled signal through 
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FIGURE 2-2-4 


Signal reconstruction based on ideal 
interpolation. 


,t|/) Sample of 5(i> 



an ideal low-pass filter with impulse response h(t) = (sin 2kWc)12kW(. Figure 
2-2-4 illustrates the signal reconstruction process based on ideal interpolation. 

A stationary stochastic process X(t) is said to be band-limited if its power 
density spectrum <b(/) = 0 for |/| > W. Since 4>(/) is the Fourier transform of 
the autocorrelation function <f>(r), it follows that <£(r) can be represented as 








(2-2-36) 


where {^(n/2W)} are samples of </>(r) taken at T-n/2W, n =0, ±1, ±2, 

Now, if X(t ) is a band-limited stationary stochastic process then X{i) can be 
represented as 


X(t) = 




(2-2-37) 


where {X(n!2W)} are samples of X(t) taken at f = n(2W, n = 0, ±1, ±2, 

This is the sampling representation for a stationary stochastic process. The 
samples are random variables that are described statistically by appropriate 
joint probability density functions. The signal representation in (2-2-37) is 
easily established by showing that (Problem 2-17) 





*( 0 ~ 



sin 27rWi 

HBI 

2' 


! 

► 

2V/) , 

J 


(2-2-38) 


Hence, equality between the sampling representation and the stochastic 
process 2f(f) holds in the sense that the mean square error is zero. 
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2-2-5 Discrete-Time Stochastic Signals and Systems 

The characterization of continuous-time stochastic signals given above can be 
easily carried over to discrete-time stochastic signals. Such signals are usually 
obtained by uniformly sampling a continuous-time stochastic process. 

A discrete-time stochastic process X(n) consists of an ensemble of sample 
sequences {.c(/i )}. The statistical properties of X(n) are similar to the 
characterization of X{t) with the restriction that n is now an integer (time) 


variable. Hence, the mth moment of X(n) is defined as 

£{x;;>] = j x':;p(x„)d x„ 

(2-2-39) 

and the autocorrelation sequence is 

<t>(n,k) = kE{X„Xt) = J 

F* /*X 

X„X£p(X„, X L )dX„ dX k 

x J X 

(2-2-40) 

Similarly, the autocovariance sequence is 

M",k) = <Hn,k)- E(X„)E(Xt) 

(2-2-41) 

For a stationary process, we have k) = (f>(n — k), p.(n, k) s p. (n 

- k ), and 

p(n - k) = 

- <t>{n -k)~ |m t |- 

(2-2-42) 


where m K - E(X n ) is the mean value. 

As in the case of continuous-time stochastic processes, a discrete-time 
stationary process has infinite energy but a finite average power, which is 
given as 

E(IX„I 2 ) = *<0) (2-2-43) 

The power densitv spectrum for the discrete-time process is obtained by 
computing the Fourier transform of Since </>(«) is a discrete-time 

sequence, the Fourier transform is defined as 

x 

<*>(/■)= X <f>(n)e'' 2,rf " (2-2-44) 

rt=s -x 

and the inverse transform relationship is 

<M«) = 4 Kfy^'df (2-2-45) 

■t- 1/2 

We make the observation that the power density spectrum <t>(/) is periodic 

with a period f p = 1. In other words, + k) = for k = ±1, ±2 This 

is a characteristic of the Fourier transform of any discrete-time sequence such 
as 4>(n). 

Finally, let us consider the response of a discrete-time, linear time-invariant 
system to a stationary stochastic input signal. The system is characterized in 
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the time domain by its unit sample response h(n) and in the frequency domain 
by the frequency response H(f), where 

//(/) = £ h(n)e - )2nfn (2-2-46) 

n = - * 

The response of the system to the stationary stochastic input signal X(n) is 
given by the convolution sum 


y(n)~ X k(k)x{n - k) (2-2-47) 

A -- x 

The mean value of the output of the system is 


x 

m v = E[y(n)] = V h(k)E[x(n - *)] 

k - - ~*- 

m y = m K ^ h(k) = m,H( 0) 


(2-2-48) 


where 7/(0) is the zero frequency (dc) gain of the system. 

The autocorrelation sequence for the output process is 

<*>>,(*) = iE[y*{n)y(n + *)] 

Z 2 h*(i)h(j)E[x*(n - i)x(n + k 

r - x j = x 

X X 

= S Z /i ' *0' )M/)<M* -/' + «) (2-2-49) 

I ' X j :■ - "X 

This is the general form for the autocorrelation sequence of the system output 
in terms of the autocorrelation of the system input and the unit sample 
response of the system. By taking the Fourier transform of </> vv (k) and 
substituting the relation in (2-2-49), we obtain the corresponding frequency 
domain relationship 

<M/) = <*>«(/) \H{f)\ 2 (2-2-50) 

which is identical to (2-2-27) except that in (2-2-50) the power density spectra 
<f> vl (/) and 4> tv (/) and the frequency response H{f) are periodic functions of 
frequency with period f p = 1. 


2-2-6 Cyclostationary Processes 

In dealing with signals that carry digital information we encounter stochastic 
processes that have statistical averages that are periodic. To be specific, let us 
consider a stochastic process of the form 

x 

*(0= S a n g(t-nT ) 


(2-2-51) 
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where {a„} is a (discrete-time) sequence of random variables with mean 
m a =E{a n ) for all n and autocorrelation sequence <&„(&) = 2 E(a*a„ +k ). The 
signal g(f) is deterministic. The stochastic process X (t) represents the signal for 
several different types of linear modulation techniques which are introduced in 
Chapter 4. The sequence {a„} represents the digital information sequence (of 
symbols) that is transmitted over the communication channel and 1 IT 
represents the rate of transmission of the information symbols. 

Let us determine the mean and autocorrelation function of X(/) First, the' 
mean value is 

£[*(01= i E{a n )g(t~nT ) 


= m„ £ g(t~nT) 


(2-5-52) 


We observe that the mean is time-varying. In fact, it is periodic with period T. 
The autocorrelation function of X(t) is 

<t>xx(t+r,t)=kE[X(t + x)X*(0] 

X X 

= 2 2 S E(a*a m )g*(t - nT)g(t + x - mT) 


n = -x n j = — x 


x x 


2 2 <t> aa (m - n)g*(t - nT)g(t + T - mT) (2-2-53) 




Again, we observe that 


<f>xr(t + x + kT,t + kT ) = <£„(/ + r, r) (2-2-54) 

for k = ± 1, ±2 Hence, the autocorrelation function of X(t) is also 

periodic with period T. 

Such a stochastic process is called cyclostationary or periodically stationary. 
Since the autocorrelation function depends on both the variables t and x, its 
frequency domain representation requires the use of a two-dimensional 
Fourier transform. , 

Since it is highly desirable to characterize such signals by their power 
density spectrum, an alternative approach is to compute the time-average 
autocorrelation function over a single period, defined as 

1 f rn 

<t>xx(T) = - I AAt + T' t) dt (2-2-55) 

1 } -ra 

Thus, we eliminate the time dependence by dealing with the average 
autocorrelation function. Now, the fourier transform of 0„(r) yields the 
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average power density spectrum of the cyclostationary stochastic process. This 
approach allows us to simply characterize cyclostationary processes in the 
frequency domain in terms of the power spectrum. That is, the power density 
spectrum is 

<M/) = { <MrV /2 *'Mr (2-2-56) 

2-3 BIBLIOGRAPHICAL NOTES AND REFERENCES 

In this chapter we have provided a review of basic concepts and definitions in 
the theory of probability and stochastic processes. As stated in the opening 
paragraph, this theory is an important mathematical tool in the statistical 
modeling of information sources, communication channels, and in the design of 
digital communication systems. Of particular importance in the evaluation of 
communication system performance is the Chemoff bound. This bound is 
frequently used in bounding the probability of error of digital communication 
systems that employ coding in the transmission of information. Our coverage 
also highlighted a number of probability distributions and their properties, 
which are frequently encountered in the design of digital communication 
systems. 

The texts by Davenport and Root (1958), Davenport (1970), Papoulis 
(1984) Pebbles (1987), Helstrom (1991) and Leon-Garcia (1994) provide 
engineering-oriented treatments of probability and stochastic processes. A 
more mathematical treatment of probability theory may be found in the text by 
Loeve (1955). Finally, we cite the book by Miller (1964), which treats 
multidimensional gaussian distributions. 


PROBLEMS 


2-1 One experiment has four mutually exclusive outcomes A„ i = 1,2, 3,4, and a 
second experiment has three mutually exclusive outcomes B r j — 1,2. 3. The joint 
probabilities P(A,, B,) are 


B,) =0.10, 
P(A„ fl.) = 0.05, 
P(A„ B,) ~ 0.05, 
P(A a , fl.H0.ll, 


P(A U B 2 ) - 0.08, 
P(A : , B 2 )~ 0.03, 
P(A„ B 2 ) = 0.12, 
P(A 4 , B 2 ) = 0.04, 


P(A,.B ) ) = 0.13 
P(A 2 , By) = 0.09 
P(Ay, By) =0.14 
P(A 4 , By) = 0.06 


Determine the probabilities P(A,), i = 1,2, 3, 4, and P{B,), j = 1 , 2, 3. 


2-2 The random variables X,, i — 1, 2, . . . , n, have the joint pdf p(. c., x->, . . . , x„) 
Prove that 


P{X t .X,.Xy ,v„) 

~p{x„ | X,, _ | , 


X,)p(x„ | j X„ 2 -*l) • • -p(Xy jx,, x,)p(x 2 |x,)p(t,) 


> 

J 
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2-3 The pdf of a random variable X is p(x). A random variable Y is defined as 

Y = aX + b 

where a < 0. Determine the pdf of Y in terms of the pdf of X. 

2-4 Suppose that A" is a gaussian random variable with zero mean and unit variance. 
Let 


Y ~ aX y + b, a >0 

Determine and plot the pdf of Y. 

2-5 a Let X, and X, be statistically independent zero-mean gaussian random variables 
with identical variance. Show that a (rotational) transformation of the form 

Y,+jY,-(X,+jX,)e* 

results in another pair (Y„ Y) of gaussian random variables that have the same 
joint pdf as the pair (A',, X,). 
b Note that 



where A is a 2x2 matrix. As a generalization of the two-dimensional 
transformation of the gaussian random variables considered in (a), what 
property must the linear transformation A satisfy if the pdfs for X and Y, where 
V = AX, X = (X,X 2 ■ • - X n ) and Y = (V, - ■ Y„), are identical? 

2-6 The random variable Y is defined as 


Y-'ZX, 

1 


where the X„ i - 1, 2, . . . . n, are statistically independent random variables with 


X, = 


1 

0 


with probability p 
with probability 1 -p 


a Determine the characteristic function of Y. 

b From the characteristic function, determine the moments E(Y) and E(Y 2 ). 

2-7 The fouT random variables X , , X 2 , AT,, X 4 are zero-mean jointly gaussian 
random variables with covariance p t/ = E(X,X ; ) and characteristic function 
jvj , jvi , jv 4 ). Show that 


E(X fX 2 X 3 X 4 ) /i 12 /L.J 4 + gufly + guMil 

2-8 From the characteristic functions for the central chi-square and noncentral 
chi-square random variables given by (2-1-109) and (2-1-117), respectively. 
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determine the corresponding first and second moments given by (2-1-112) and 
( 2 - 1 - 125 ) 

2-9 The pdf of a Cauchy distributed random variable X is 

tt/ji 

p[. r) = — -X <X <* 

: r " + u‘ 

a Determine the mean and variance of X. 
b Determine the Characteristic function of X. 

2-10 The random variable Y is defined as 


”, , 


where X ,, i = 1,2, .... n, are statistically independent and identically distributed 
random variables each of which has the Cauchy pdf given in Problem 2-9 
a Determine the characteristic function of Y. 
b Determine the pdf of Y. 

c Consider the pdf of Y in the limit as n—>x. Does the central limit hold? Explain 
your answer. 

2-11 Assume that random processes .v(i) and y(/) are individually and jointly stationary, 
a Determine the autocorrelation function of ;(/) — ,r(r ) +y(r). 
b Determine the autocorrelation function of z(t) when jc(r ) and y(i) are 

uncorrelated. 

c Determine the autocorrelation function of z(t) when ,r(r) and v(/) arc 

uncorrelated and have zero means. 

2-12 The autocorrelation function of a stochastic process X(t ) is 

<Mr) = i/V„S(r) 

Such a process is called white noise. Suppose ,r(r) is the input to an ideal bandpass 
filter having the frequency response characteristic shown in Fig. P2-12. Determine 
the total noise power at the output of the filter. 

2-13 The covariance matrix of three random variables X„ X , and X, is 
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-Wv- 

R 


XU) 


K(f) 


FIGURE P2-16 


The linear transformation Y - AX is made where 


A 


'too 
0 2 0 
J 0 1_ 


Determine the covariance matrix of Y. 

2-14 Let X(t) be a stationary real normal process with zero mean. Let a new process 
Y{t) be defined by 

V(/) = Y 2 (t) 

Determine the autocorrelation function of K(r) in terms of the autocorrelation 
function of X(t). Him : Use the result on gaussian variables derived in Problem 
2-7. 

2-15 For the Nakagami pdf, given by (2-1-147), define the normalized random variable 
X = tf/Vfl. Determine the pdf of X. 

2-16 The input X(t) in the circuit shown in Fig. P2-16 is a stochastic process with 
E[Jf(f)j = 0 and <fr,,(r) = <r 2 6(r), i.e., X’(f) is a white noise process, 
a Determine the spectral density <!>„.(/). 
b Determine <£ vv (r) and E{Y 2 (t)]. 

2-17 Demonstrate the validity of (2-2-38). 

2-18 Use the Chemoff bound to show that Q(x)^e where £>(.r) is defined by 
(2-1-97). 

2-19 Determine the mean, the autocorrelation sequence, and the power density 
spectrum of the output of a system with unit sample response 


h(n) = 



(* - 0 ) 

(* «l) 

(n =2) 
(otherwise) 


when the input x{n) is a white-noise process with variance rr;’. 

2-20 The autocorrelation sequence of a discrete-time stochastic process is =(() ,t| . 
Determine its power density spectrum. 

2-21 A discrete-time stochastic process X(n) ^ X(nT) is obtained by periodic sampling 
of a continuous-time zero-mean stationary process X(i) where T is the sampling 
interval, i.e., f s - 1 /T is the sampling rate. 

a Determine the relationship between the autocorrelation function of A"(/) and 
the autocorrelation sequence of X(n). 

b Express the power density spectrum of X{n) in terms of the power density 
spectrum of the process Y(f). 
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c Determine the conditions under which the power density spectrum of X(n) is 
equal to the powei density spectrum of X(t). 

2*22 Consider a band-limited zero-mean stationary stochastic X(t ) with power density 
spectrum 



(1/1 ^VV) 
(\f\>W) 


X(t) is sampled at a rate f - 1/7 to yield a discrete-time process X(n)» X(rtT). 
a Determine the expression for the autocorrelation sequence of X(n). 
b Determine the minimum value of T that results in a white (spectrally flat) 
sequence. 

c Repeat (b) if the power density spectrum of /K(f) is 



1-l/i/W 

0 


(I/I«W) 

(l/l>VV) 


2-23 Show that the functions 


/*(') = 



k — 0 , ± 1 , ± 2 , . .. 


are orthogonal over the interval [-», *), i.e., 


\ MW)dt = { 


M2W (k=/) 

0 (**;) 


Therefore, the sampling theorem reconstruction formula may be viewed as a series 
expansion of the band-limited signal s(f), where the weights are samples of s(t) 
and the {/*(*)} are the set of orthogonal functions used in the series expansion. 
2-24 The noise equivalent bandwidth of a system is defined as 

B " = dl |W(/)|2rf/ 

where G = max|//(/)| 2 . Using this definition, determine the noise equivalent 
bandwidth of the ideal bandpass filter shown in Fig. P2-12 and the lowpass system 
shown in Fig, P2-16. 
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SOURCE CODING 


Communication systems are designed to transmit the information generated by 
a source to some destination. Information sources may take a variety of 
different forms. For example, in radio broadcasting, the source is generally an 
audio source (voice or music). In TV broadcasting, the information source is a 
video source whose output is a moving image. The outputs of these sources are 
analog signals and, hence, the sources are called analog sources. In contrast, 
computers and storage devices, such as magnetic or optical disks, produce 
discrete outputs (usually binary or ASCII characters) and, hence, they are 
called discrete sources. 

Whether a source is analog or discrete, a digital communication system is 
designed to transmit information in digital form. Consequently, the output of 
the source must be converted to a format that can be transmitted digitally. This 
conversion of the source output to a digital form is generally performed by the 
source encoder, whose output may be assumed to be a sequence of binary 
digits. 

In this chapter, we treat source encoding based on mathematical models of 
information sources and a quantitative measure of the information emitted by 
a source. We consider the encoding of discrete sources first and then we discuss 
the encoding of analog sources. We begin by developing mathematical models 
for information sources. 


3-1 MATHEMATICAL MODELS FOR INFORMATION 
SOURCES 

Any information source produces an output that is random, i.e., the source 
output is characterized in statistical terms. Otherwise, if the source output 
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were known exactly, there would be no need to transmit it. In this section, we 
consider both discrete and analog information sources, and we postulate 
mathematical models for each type of source. 

The simplest type of discrete source is one that emits a sequence of letters 
selected from a finite alphabet. For example, a binary source emits a binary 
sequence of the form 100101110, where the alphabet consists of the two 
letters {0, 1}. More generally, a discrete information source with an alphabet of 
L possible letters, say {*,, x 2 , . . . , x L ], emits a sequence of letters selected 
from the alphabet. 

To construct a mathematical model for a discrete source, we assume that 

each letter in the alphabet {x, t x 2 x L } has a given probability p k of 

occurrence. That is. 


where 


p k = P(X = x k ), l^k^L 


2 


k = \ 


Pk = 1 


We consider two mathematical models of discrete sources. In the first, we 
assume that the output sequence from the source is statistically independent. 
That is, the current output letter is statistically independent from all past and 
future outputs. A source whose output satisfies the condition of statistical 
independence among output letters in the sequence is said to be memoryless. 
Such a source is called a discrete memoryless source (DMS). 

If the discrete source output is statistically dependent, as, for example, 
English text, we may construct a mathematical model based on statistical ’ 
stationarity. By definition, a discrete source is said to be stationary if the 

joint probabilities of two sequences of length n, say a f . a 2 a„ and 

flntm, are identical for all n ^ 1 and for all shifts m. In other 

words, the joint probabilities for any arbitrary length sequence of source 
outputs are invariant under a shift in the time origin. 

An analog source has an output waveform x(t ) that is a sample function of a 
stochastic process A '(/). We assume that X(t) is a stationary stochastic process 
with autocorrelation function <Mr) and power spectral density <!>,,(/). When 
X{t) is a bandlimited stochastic process, i.e., <t> xx (f) = 0 for \f\ s* W, the 
sampling theorem may be used to represent X(t) as 






(3-1-1) 


where {X{n!2W)} denote the samples of the process X{t) taken at the 
sampling (Nyquist) rate of f s = 2 W samples/s. Thus, by applying the sampling 
theorem, we may convert the output of an analog source into an equivalent 



84 DIGITAL COMMUNICATIONS 


discrete-time source. Then, the source output is characterized statistically by 
the joint pdf p(x t ,x 2 ,..., x m ) for all m > 1, where X H - X(n/2W), 1 *: n m, 
are the random variables corresponding to the samples of X(t). 

We note that the output samples {X(n/2W)} from the stationary sources are 
generally continuous, and, hence, they cannot be represented in digital form 
without some loss in precision. For example, we may quantize each sample to a 
set of discrete values, but the quantization process results in loss of precision, 
and, consequently, the original signal cannot be reconstructed exactly from the 
quantized sample values. Later in this chapter, we shall consider the distortion 
resulting from quantization of the samples from an analog source. 


3-2 A LOGARITHMIC MEASURE OF INFORMATION 

To develop an appropriate measure of information, let us consider two discrete 
random variables with possible outcomes x h i - 1, 2, . . . , n, and y it i ~ 
1,2, ... ,m, respectively. Suppose we observe some outcome Y = y, and we 
wish to determine, quantitatively, the amount of information that the 

occurrence of the event Y = y, provides about the event X =x„ i = 1, 2 n. 

We observe that when X and Y are statistically independent, the occurrence of 
Y ~ yj provides no information about the occurrence of the event X = x r On 
the other hand, when X and Y are fully dependent such that the occurrence of 
y ~Yi determines the occurrence of X ~x it the information content is simply 
that provided by the event X-x,. A suitable measure that satisfies these 
conditions is the logarithm of the ratio of the conditional probability 

P(* -x, | F «y,) -/»(*, |y,) 

divided by the probability 

P(X = Xi )mP( Xi ) 

That is, the information content provided by the occurrence of the event Y = y 
about the event X - x, is defined as 

/( *‘ ; * )=,og ^^ P-m) 

/(x,;y y ) is called the mutual information between x, and y-. 

The units of /(x,;y,) are determined by the base of the logarithm, which is 
usually selected as either 2 or e. When the base of the logarithm is 2, the units 
of /(*.•; y,) are bits, and when the base is e, the units of /(x,;y ; ) are called nats 
(natural units). (The standard abbreviation for log, is In.) Since 

In a ~ In 2 log 2 a — 0.693 15 log* a 

the information measured in nats is equal to In 2 times the information 
measured in bits. 

When the random variables X .and Y are statistically independent. 
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P(X / 1 y ; ) = P(Xj) and, hence, l(x i :y j ) = 0. On the other hand, when the 
occurrence of the event Y ~ uniquely determines the occurrence of the event 
X=x „ the conditional probability in the numerator of (3-2-1) is unity and, 
hence, 

/(*,;>,) = log -log P(Xi) (3-2-2) 

But (3-2-2) is just the information of the event X =x,. For this reason, it is 
called the self -information of the event X = x, and it is denoted as 

f(x.) ~ log = -'°g p ( x <) (3-2-3) 

We note that a high-probability event conveys less information than a 
low-probability event. In fact, if there is only a single event x with probability 
P(x) = 1 then /(. c) = 0. To demonstrate further that the logarithmic measure of 
information content is the appropriate one for digital communications, let us 
consider the following example. 


Example 3-2-1 

Suppose we have a discrete information source that emits a binary digit, 
either 0 or 1, with equal probability every r, seconds. The information 
content of each output from source is 

I (.v, ) = - log. P(Xj), .r, = 0, 1 
= -log; k - 1 bit 

Now suppose that successive outputs from the source are statistically 
independent, i.e., the source is memoryless. Let us consider a block of k 
binary digits from the source that occurs in a time interval kr s . There are 
M ~ 2 k possible A-bit blocks, each of which is equally probable with 
probability \/M =2'*. The self-information of a At -bit block is 

/(•*/) = ~log 2 2'* = k bits 

emitted in a time interval kr,. Thus the logarithmic measure of information 
content possesses the desired additivity property when a number of source 
outputs is considered as a block. 

Now let us return to the definition of mutual information given in (3-2-1) 
and multiply the numerator and denominator of the ratio of probabilities by 
P(y t ). Since 

/*(*, | yj) P{x t | y,)P(y) _ P(x n y,) P(y, \ x , ) 
n- r,) P(x,)P(y,) P(x,)P( yi ) ~ />(>,) 
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we conclude that 

/<«#)-/(#:*) (3-2-4) 

Therefore the information provided by the occurrence of the event Y = y, 
about the event X ~x, is identical to the information provided by the 
occurrence of the event X = jc, about the event Y = yy. 


Example 3-2-2 

Suppose that X and Y are binary-valued {0, 1} random variables that 
represent the input and output of a binary-input, binary-output channel. 
The input symbols are equally likely and the output symbols depend on the 
input according to the conditional probabilities 

/ > (y = o|jr=o)=i -p 0 

P(Y=1 |* = 0) = p o 
P(Y = 1 |* = 1) = 1 -p, 

P(Y- 0 | ^ — 1) ~Px 

Let us determine the mutual information about the occurrence of the events 
X - 0 and X - 1, given that Y = 0. 

From the probabilities given above, we obtain 

P(Y = 0) = P(Y = 0| X =0)P(X =0) + P(Y = 0 | X « 1)P(X = 1) 

“ -Po + Pi) 

P(Y » i) « p( Y = ! | X - 0)P{X = 0) + P(Y = 1 | X = \)P(X = 1) 

= i(l -pi+po) 

Then, the mutual information about the occurrence of the event X - 0, 
given that Y = 0 is observed, is 


/(.r, - /(ft 0) = log, = lo fc 


2(1 -Po) 
l-po + pi 


Similarly, given that Y - 0 is observed, the mutual information about the 
occurrence of the event X — 1 is 


2 Pi 


/(jf2;yi) i, /(i;0) = iog 2 


1-Po + Pi 
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Let us consider some special cases: First, if po = Pi~ 0, the channel is called 
noiseless and 

7(0; 0) = log 2 2 = 1 bit 

Hence, the output specifies the input with certainty. On the other hand, if 
Po = Pi - i, the channel is useless because 

7(0; 0) = log2 1=0 

However, if p 0 — p } = then 

7(0; 0) = log 2 2 = 0.587 
7(0; 1 ) = log 2 \ = -1 bit 


. In addition to the definition of mutual information and self-information, it is 
useful to define the conditional self -information as 

1 yi) = ,Q g j y ) = “log P{x, | y,) (3-2-5) 

Then, by combining (3-2-1), (3-2-3), and (3-2-5), we obtain the relationship 

y,) = 1 Yi) (3-2-6) 

We interpret 7(x, | yj) as the self-information about the event X - x, after 
having observed the event Y =y r Since both 7(x,)^0 and 7(x, j yf) > 0, it 
follows that 7(x,;y y ) < 0 when 7(x, ( y,) > 7(x r ), and 7(x,; y f ) > 0 when 7(x, J y y ) < 
/(*,). Hence, the mutual information between a pair of events can be either 
positive, or negative, or zero. 


3-2*1 Average Mutual Information and Entropy 

Having defined the mutual information associated with the pair of events 
(•*<> y,)> which are possible outcomes of the two random variables X and Y, we 
can obtain the average value of the mutual information by simply weighting 
by the probability of occurrence of the joint event and summing over 
all possible joint events. Thus, we obtain 


/(*! y) = E § P(x„ y/)7(x,;y,) 


7=1 /»! 



2 


7*1 


P(x„ y,) log 


P{Xj, Yi) 

nx,)P{y,) 


(3-2-7) 


as the average mutual information between X and Y. We observe that 
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l(X\Y) = 0 when X and Y are statistically independent. An important 
characteristic of the average mutual information is that I(X;Y)^0 (see 
Problem 3-4). 

Similarly, we define the average self-information, denoted by H( X), as 

H(X) = X P(x,)f(x,) 

l = i 

= -±P(x,)\o g P( Xl ) (3-2-8) 

i-i 

When X represents the alphabet of possible output letters from a source, H(X) 
represents the average self-information per source letter, and it is called the 
entropy t of the source. In the special case in which the letters from the source 
are equally probable, P(x,)= 1 In for all i, and, hence, 

W(AT)'-i ;log; 

= logn (3-2-9) 

In general, //(.¥)« logo (see Problem 3-5) for any given set of source letter 
probabilities. In other words, the entropy of a discrete source is a maximum 
when the output letters are equally probable. 


Example 3-2-3 

Consider a source that emits a sequence of statistically independent letters, 
where each output letter is either 0 with probability q or 1 with probability 
1 - q. The entropy of this source is 

H(X)**H(q)= -q\ogq- (\-q)\og(\- q) (3-2-10) 

The binary entropy function H(q) is illustrated in Fig. 3-2-1. We observe 
that the maximum value of the entropy function occurs at q = | where 

H(h) = 1 . 


The average conditional self-information is called the conditional entropy 
and is defined 

H(X | Y) = ± £ P(x h yj ) log — (3-2-11) 
i-l ;=1 "(*» [ y,) 

We interpret H(X\Y) as the information or uncertainty in X after Y is 


tThe term entropy is taken from statistical mechanics (thermodynamics), where a function 
similar to (3-2-8) is called (thermodynamic) entropy. 
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FIGURE 3-2-1 


Binary entropy function. 


w«r) 



observed. By combining (3-2-7), (3-2-8), and (3-2-11) we obtain the 
relationship 

/( X; Y) = H(X) - H(X | Y ) (3-2-12) 

Since I{X\ Y) 2= 0, it follows that H(X)z* H(X \Y), with equality if and 
only if X and Y are statistically independent. If we interpret H(X | V) as the 
average amount of (conditional self-information) uncertainty in X after we 
observe Y, and H(X) as the average amount of uncertainty (self-information) 
prior to the observation, then I(X ; V) is the average amount of (mutual 
information) uncertainty provided about the set X by the observation of the set 
Y. Since H(X)^ H(X | K), it is clear that conditioning on the observation Y 
does not increase the entropy. 


Example 3-2-4 

Let us evaluate the H( X \ V) and 1(X\ y) for the binary-input, binary- 
output channel treated previously in Example 3-2-2 for the case where 
Po~ Pi =p. Let the probabilities of the input symbols be P(X = 0 )-q and 
P{ X = 1) = 1 - q. Then the entropy is 

H(X) * H{q) = ~q log q - (1 - q) log (1 - q ) 

where H(q) is the binary entropy function and the conditional entropy 
H(X j y) is defined by (3-2-11). A {dot of H(X \ Y) as a function of q with 
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FIGURE 3-2*2 


FIGURE 3-2*3 


Condition*! entropy for binary-input, binary- 
output symmetric channel. 



P as a parameter is shown in Fig. 3-2-2. The average mutual information 
1(X\Y) is plotted in Fig. 3-2-3. 

As in the preceding- example, when the conditional entropy H(X\ Y) is 
viewed in terms of a channel whose input is X and whose output is Y, 
H(X \ F) is called the equivocation and is interpreted as the amount of average 
uncertainty remaining in X after observation of Y. 


Average mutual information for binary-input, 
binary-output symmetric channel. 






CHAPTER .V SOURCE CODING 91 


The results given above can be generalized to more than two random 
variables. In particular, suppose we have a block of k random variables 
X\ X 2 • ■ • X k , with joint probability P(x^x 2 • • • x k ) = P(X\ - jc, . X 2 = 
x 2 , . . . , X k = x k ). Then, the entropy for the block is defined as 


H\ m . ti k 

H(XtX 2 ■■■X k )=-% 2 ■ ' • t Pix h x h ■ • ■ xj log P(x h x h ■ • • x k ) (3-2-13) 

/i c l />=! f* = l 

Since the joint probability P(x t x 2 ■ • ■ x k ) can be factored as 

P(xiX 2 ■ ■ x k ) = P(x,)P(x 2 |x,)P(x, |x,x 2 ) ■ • • P(x k \x,x 2 • • • x k ,) 

(3-2-14) 

it follows that 


H(X x X 2 X y ■ ■ X k ) = H{X ,) + H(X 2 | *,) + H(X, \ X } X 2 ) 

+ ...+H(X k \X r --X kl ) 

= 2H(X i \X l X 2 ---X,. l ) (3-2-15) 

I - I 

By applying the result H(X) 2= H{X | V), where X - X m and Y- 
X t X 2 - • ■ X„, |, in (3-2-15) we obtain 


H(X l X 2 •■■X k )^2 H(X,„) (3-2-16) 

m 1 

with equality if and only if the random variables X t ,X 2 X k are 

statistically independent. 


3-2-2 Information Measures for Continuous 
Random Variables 

The definition of mutual information given above for discrete random variables 
may be extended in a straightforward manner to continuous random variables. 
In particular, if X and Y are random variables with joint pdf p(x, y) and 
marginal pdfs p(x) and p(y), the average mutual information between X and 
Y is defined as 


I(X:Y) = 



P(*)p(y j*)log 


p(y I x )p(x) 
p(*)p(y) 


dx dy 


(3-2-17) 


Although the definition of the average mutual information carries over to 
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continuous random variables, the concept of self-information does not. The 
problem is that a continuous random variable requires an infinite number of 
binary digits to represent it exactly. Hence, its self-information is infinite and, 
therefore, its entropy is also infinite. Nevertheless, we shall define a quantity 
that we call the differential entropy of the continuous random variable X as 

//(*)=-[ P Wogp{x)dx (3-2-18) 

J—-x 

We emphasize that this quantity does not have the physical meaning of 
seif-information, although it may appear to be a natural extension of the 
definition of entropy for a discrete random variable (see Problem 3-6). 

By defining the average conditional entropy of X given Y as 

H(X I Y ) = ~ 1 P(x, y ) log p(x | y ) dx dy (3-2-19) 

the average mutual information may be expressed as 

1{X\ Y ) = H{X) - H(X | Y) 


or, alternatively, as 


I(X,Y) = H(Y)-H(Y\X) 

In some, cases of practical interest, the random variable X is discrete and Y 
is continuous. To be specific, suppose that X has possible outcomes x h 
i = 1,2, . , n, and Y is described by its marginal pdf p(y). When X and Y are 
statistically dependent, we may express p(y) as 

P(y ) “ 2 P(y I XiWixt) 

i - 1 


The mutual information provided about the event X = x, by the occurrence of 
the event Y = y is 


I (xr,y) = log 


p(y \*.)P(x.) 
p(y)P(*i) 


= log 


p(y I 
p(y) 


(3-2-20) 


Then, the average mutual information between X and Y is 


i(X; 10=2 f p(y I j.V'M k>g (3 . 2 . 21) 

ft J-* p(y) 
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Example 3-2-5 

Suppose that A' is a discrete random variable with two equally probable 
outcomes x, = A and x 2 = -A. Let the conditional pdfs p(y \ x,), i = 1, 2, be 
gaussian with mean x, and variance a 2 . That is, 

p {y\ A) ~JT^ elyA>2,2al (3 ' 2 ' 22) 

p(y | = - (y+A),a<r} (3-2-22) 


The average mutual information obtained from (3-2-21) becomes 


KX-Y) 


-111 


p(y I a) log 


e 0 ’ |iy +poi-W ° Lj " i4)1 


p(y) 


p(y) 


dy 


(3-2-23) 

p (y) = 2 [p{y\A) + p(y\~A)\ (3-2-24) 

In Chapter 7, it will be shown that the average mutual information I(X ; V) 
given by (3-2-23) represents the channel capacity of a binary-input additive 
white gaussian noise channel. 


3-3 CODING FOR DISCRETE SOURCES 

In Section 3-2 we introduced a measure for the information content associated 
with a discrete random variable X. When X is the output of a discrete source, 
the entropy H(X ) of the source represents the average amount of information 
emitted by the source. In this section, we consider the process of encoding the 
output of a source, i.e., the process of representing the source output by a 
sequence of binary digits. A measure of the efficiency of a source-encoding 
method can be obtained by comparing the average number of binary digits per 
output letter from the source to the entropy H(X). 

The encoding of a discrete source having a finite alphabet size may appear, 
at first glance, to be a relatively simple problem. However, this is true only 
when the source is memoryless, i.e., when successive symbols from the source 
are statistically independent and each symbol is encoded separately. The 
discrete memoryless source (DMS) is by far the simplest model that can be 
devised for a physical source. Few physical sources, however, closely fit this 
idealized mathematical model. For example, successive output letters from a 
machine printing English text are expected to be statistically dependent. On 
the other hand, if the machine output is a computer program coded in Fortran, 
the sequence of output letters is expected to exhibit a much smaller 
dependence. In any case, we shall demonstrate that it is always more efficient 
to encode blocks of symbol instead of encoding each symbol separately. By 
making the block size sufficiently large, the average number of binary digits 
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per output letter from the source can be made arbitrarily close to the entropy 
of the source. 


3-3-1 Coding for Discrete Memoryless Sources 

Suppose that a DMS produces an output letter or symbol every r, seconds. 
Each symbol is selected from a finite alphabet of symbols x,, i = 1, 2, . . . , L, 
occurring with probabilities P( jc,), i = 1, 2, . . . , L. The entropy of the DMS in 
bits per source symbol is 

H{X) « - £ P{x t ) log 2 P( Xi ) < log 2 L (3-3-1) 

i = i 

where equality holds when the symbols are equally probable. The average 
number of bits per source symbol is H(X) and the source rate in bits/s is 
defined as H{ X)/z,. 

Fixed-Length Code Words First we consider a block encoding scheme 
that assigns a unique set of R binary digits to each symbol. Since there are L 
possible symbols, the number of binary digits per symbol required for unique 
encoding when L is a power of 2 is 

R = log 2 L (3-3-2) 

and, when L is not a power of 2, it is 

R = Llog 2 Z.J + 1 (3-3-3) 

where LxJ denotes the largest integer less than x. The code rate R in bits per 
symbol is now R and, since H{X) «£ log 2 L, it follows that R ^ H(X). 

The efficiency of the encoding for the DMS is defined as the ratio H(X)/R. 
We observe that when L is a power of 2 and the source letters are equally 
probable, R - H(X). Hence, a fixed-length code of R bits per symbol attains 
100% efficiency. However, if L is not a power of 2 but the source symbols are 
still equally probable, R differs from H(X) by at most 1 bit per symbol. When 
log 2 L » 1, the efficiency of this encoding scheme is high. On the other hand, 
when L is small, the efficiency of the fixed-length code can be increased by 
encoding a sequence of J symbols at a time. To accomplish the desired 
encoding, we require L J unique code words. By using sequences of N binary 
digits, we can accommodate 2 s possible code words. N must be selected such 
that 

N^Jlo&L 

Hence, the minimum integer value of N required is 

N = \j log 2 Lj + 1 (3-3-4) 

Now the average number of bits per source symbol is N/J - R, and, thus, the 
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inefficiency has been reduced by approximately a factor of 1/7 relative to the 
symbol-by-symbol encoding described above. By making 7 sufficiently large, 
the efficiency of the encoding procedure, measured by the ratio JH{X)lN, can 
be made as close to unity as desired. 

The encoding methods described above introduce no distortion since the 
encoding of source symbols or blocks of symbols into code words is unique. 
This type of encoding is called noiseless. 

Now, suppose we attempt to reduce the code rate R by relaxing the 
condition that the encoding process be.unique. For example, suppose that only 
a fraction of the L J blocks of symbols is encoded uniquely. To be specific, let 
us select the 2 N - 1 most probable ./-symbol blocks and encode each of them 
uniquely, while the remaining L J - (2* - \) 7-symbol blocks are represented 
by the single remaining code word. This procedure results in a decoding failure 
or (distortion) probability of error every time a low probability block is 
mapped into this single code word. Let P e denote this probability of error. 
Based on this block encoding procedure, Shannon (1948a) proved the 
following source coding theorem. 


Source Coding Theorem I 

Let X be the ensemble of letters from a DMS with finite entropy H( X). 
Blocks of 7 symbols from the source are encoded into code words of length 
N from a binary alphabet. For any e>0, the probability P r of a block 
decoding failure can be made arbitrarily small if 

R*j>H(X) + e (3-3-5) 

and 7 is sufficiently large. Conversely, if 

R^H(X)-e (3-3-6) 

then P e becomes arbitrarily close to 1 as 7 is made sufficiently large. 


From this theorem, we observe that the average number of bits per symbol 
required to encode the output of a DMS with arbitrarily small probability of 
decoding failure is lower bounded by the source entropy H(X). On the other 
hand, if R<H( X) y the decoding failure rate approaches 100% as 7 is 
arbitrarily increased. 


Variable- Length Code Words When the source symbols are not equally 
probable, a more efficient encoding method is to use variable-length code 
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TABLE 3-3-1 VARIABLE-LENGTH CODES 


Letter 

P(a k ) 

Code I 

Code n 

Code III 


1 

2 

1 

0 

0 


1 

4 

00 

to 

01 

Oj 

1 

K 

01 

no 

on 

<*4 

1 

H 

10 

in 

in 


words. An example of such encoding is the Morse code, which dates back to 
the nineteenth century. In the Morse code, the letters that occur more 
frequently are assigned short code words and those that occur infrequently are 
assigned long code words. Following this general philosophy, we may use the 
probabilities of occurrence of the different source letters in the selection of the 
code words. The problem is to devise a method for selecting and assign- 
ing the code words to source letters. This type of encoding is called entropy 
coding. 

For example, suppose that a DMS with output letters a u a 2 , a 3 , a 4 and 
corresponding probabilities P(a,) = $, P(a 2 ) = J, and P(a 3 ) = P(a 4 ) = l is 
encoded as shown in Table 3-3-1. Code I is a variable-length code that has a 
basic flaw. To see the flaw, suppose we are presented with the sequence 

001001 Clearly, the first symbol corresponding to 00 is a 2 . However, the 

next four bits are ambiguous (not uniquely decodable). They may be decoded 
either as a 4 a 3 or as Cia 2 flj. Perhaps, the ambiguity can be resolved by wailing 
for additional bits, but such a decoding delay is highly undesirable. We shall 
only consider codes that are decodable instantaneously, that is, without any 
decoding delay. 

Code I] in Table 3-3-1 is uniquely decodable and instantaneously decodable. 
It is convenient to represent the code words in this code graphically as terminal 
nodes of a tree, as shown in Fig. 3-3-1. We observe that the digit 0 indicates the 
end of a code word for the first three code words. This characteristic plus the 
fact that no code word is longer than three binary digits makes this code 
instantaneously decodable. Note that no code word in this code is a prefix of 
any other code word. In general, the prefix condition requires that for a given 
code word C* of length k having elements ( b x , b 2 , . . . , b k ), there is no other 
code word of length / < k with elements (b x , b 2 , . . . , b,) for 1 l^k ~ 3. In 



a. 


«4 


FIGURE 3-3-1 Code tree for code II in Table 3-3-1. 


I 


I 
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FIGURE 3-3-2 


Code tree for code 111 in Table 3-3-1. 



other words, there is no code word of length / < k that is identical to the first l 
binary digits of another code word of length k > [. This property makes the 
code words instantaneously decodable. 

Code III given in Table 3-3-1 has the tree structure shown in Fig. 3-3-2. We 
note that in this case the code is uniquely decodable but not instantaneously 
decodable. Clearly, this code does not satisfy the prefix condition. 

Our main objective is to devise a systematic predure for constructing 
uniquely decodable variable-length codes that are efficient in the sense that the 
average number of bits per source letter, defined as the quantity 

L 

R = 2 n k P(a k ) (3-3-7) 

*= i 


is minimized. The conditions for the existence of a code that satisfies the prefix 
condition are given by the Kraft inequality. 


Kraft Inequality A necessary and sufficient condition for the existence of a 
binary code with code words having lengths n, =S/j 2 ss. . . ^n L that satisfy the 
prefix condition is 

L 

2 2“"* 1 (3-3-8) 

A=1 

First, we prove that (3-3-8) is a sufficient condition for the existence of a 
code that satisfies the prefix condition. To construct such a code, we begin with 
a full binary tree of order n~n L that has 2" terminal nodes and two nodes of 
order k stemming from each node of order Ac — 1, for each k, l^k^n. Let us 
select any node of order n t as the first code word C,. This choice eliminates 
2" - ” 1 terminal nodes (or the fraction 2 of the 2" terminal nodes). From the 
remaining available nodes of order n 2 , we select one node for the second code 
word C 2 . This choice eliminates T~ n '- terminal nodes (or the fraction 2 of 
the 2" terminal nodes). This process continues until the last code word is 
assigned at terminal node n=n L . Since, at the node of order j < L, the 
fraction of the number of terminal nodes eliminated is 

L * 

£ 2'”‘< £ 2- fl ‘«l 

*-1 *=l 
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FIGURE 3-3-3 


Construction of a binary tree code embedded in a full tree. 



there is always a node of order k > j available to be assigned to the next code 
word. Thus, we have constructed a code tree that is embedded in the full tree 
of 2" nodes as illustrated in Fig. 3-3-3, for a tree having 16 terminal nodes and 
a source output consisting of five letters with n, = 1, n 2 = 2, n 3 = 3, and 
n 4 = n 5 = 4. 

To prove that (3-3-8) is a necessary condition, we observe that in the code 
tree of order n -n t , the number of terminal nodes eliminated from the total 
number of 2" terminal nodes is 


Hence, 


L 

k - 1 


2 n 


2 


* = 1 


2 - "* « 1 


and the proof of (3-3-8) is complete. 

The Kraft inequality may be used to prove the following (noiseless) source 
coding theorem, which applies to codes that satisfy the prefix condition. 


Source Coding Theorem II 

Let X be the ensemble of letters from a DMS with finite entropy H(X), and 
output letters x k , 1 with corresponding probabilities of occurrence 

p if l«Ar=sL. It is possible to construct a code that satisfies the prefix 
condition and has an average length R that satisfies the inequalities 


H{X) R < H(X) + 1 


(3-3-9) 
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To establish the lower bound in (3-3-9), we note that for code words that 
have length n k , 1 =£ Ac « L, the difference H(X) - R may be expressed as 


H(X) - R = 'Z Pk lofc — ~ X Pa«a 
A = 1 Pk k = 1 

L 2~ n * 

= Zp* log 2 ~ 

A- 1 Pk 


(3-3-10) 


Use of the inequality In x *£ x - 1 in (3-3-10) yields 

H(X)-R^(log 2 e)£pJ— -l) 
a =* i 'Pa ' 

^(Jog2e)(i 2--‘-l)«0 
'*-1 ' 


where the last inequality follows from the Kraft inequality. Equality holds if 
and only if p k = 2”"* for 1 «£ k L. 

The upper bound in (3-3-9) may be established under the constraint that n k , 
1 L, are integers, by selecting the {n k } such that <2~ nt + l . But if 

the terms p k 5*2 "* are summed over 1 *£ k *£ L, we obtain the Kraft inequality, 
for which we have demonstrated that there exists a code that satisfies the prefix 
condition. On the other hand, if we take the logarithm of p k < 2 ”* +l , we 
obtain 

lo$p t <~n k + 1 

or, equivalently, 


n k <l -logp* 


(3-3-11) 


If we multiply both sides of (3-3-11) by p k and sum over 1 k =£ L, we obtain 
the desired upper bound given in (3-3-9). This completes the proof of (3-3-9). 

We have now established that variable length codes that satisfy the prefix 
condition are efficient source codes for any DMS with source symbols that are 
not equally probable. Let us now describe an algorithm for constructing such 
codes. 


Huffman Coding Algorithm Huffman (1952) devised a variable-length 
encoding algorithm, based on the source letter probabilities P(x,), i = 
1,2, ... ,L. This algorithm is optimum in the sense that the average number of 
binary digits required to represent the source symbols is a minimum, subject to 
the constraint that the code words satisfy the prefix condition, as defined 
above, which allows the received sequence to be uniquely and instantaneously 
decodable. We illustrate this encoding algorithm by means of two examples. 
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(U5 

0.30 


II 

0.65 0 


I 


0.20 
0. 10 
0.04 
O.tXIS 
0.005 


0 

0.01 


0 

0.05 


0 

0.15 


0 

0.35 


Leiltr 

Probability 

Self-information 

Code 

A, 

0.35 

1.5146 

00 

'> 

030 

1 7370 

01 

<1 

0.20 

2.3219 

10 

>4 

0.10 

3.32 19 

110 


0.04 

46439 

1110 


0.005 

7.6439 

lino 


FIGURE 3-3-4 An example of variable-length-source 

■'7 

0.005 

7.6439 

Mill 

encoding for a DMS. 


mx) = 1 . 1 1 

R- 2.21 



Example 3*3*1 

Consider a DMS with seven possible symbols x u x 2 x 7 having the 

probabilities of occurrence illustrated in Fig. 3-3-4. We have ordered the 
source symbols in decreasing order of the probabilities, i.e., P(x x ) > P(x 2 ) > 

. . . > P(x 7 ). We begin the encoding process with the two least probable 
symbols x b and x 7 . These two symbols are tied together as shown in Fig. 
3-3-4, with the upper branch assigned a 0 and the lower branch assigned a 1. 
The probabilities of these two branches are added together at the node 
where the two branches meet to yield the probability 0.01. Now we have the 

source symbols .t, x 5 plus a new symbol, say xi, obtained by combining 

Xt, and x 7 . The next step is to join the two least probable symbols from the 
set X\, x z , x 3 , x 4 , x 3 , x b . These are x s and x&, which have a combined 
probability of 0.05. The branch from x s is assigned a 0 and the branch from 
*6 is assigned a 1. This procedure continues until we exhaust the set of 
possible source letters. The result is a code tree with branches that contain 
the desired code words. The code words are obtained by beginning at the 
rightmost node in the tree and proceeding to. the left. The resulting code 
words are listed in Fig. 3-3-4. The average number of binary digits per 
symbol for this code is R = 2.21 bits/ symbol. The entropy of the source is 
2.11 bits/ symbol. 

We make the observation that the code is not necessarily unique. For 
example, at the next to the last step in the encoding procedure, we have a tie 
between x x and x 3> since these symbols are equally probable. At this point, we 
chose to pair x, with x 2 . An alternative is to pair x 2 with xj. If we choose this 
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FIGURE 3-3-5 


FIGURE 3-3-6 



An alternative code for the DMS in 
Example 3-3-1. 


Letter 

Code 

*i 

0 

•*2 

10 

*3 

no 

*4 

1110 

*5 

lino 

*6 

1 ] 1 1 10 

*1 

111)11 


R = 2.21 


pairing, the resulting code is illustrated in Fig. 3-3-5. The average number of 
bits per source symbol for this code is also 2.21. Hence, the resulting codes are 
equally efficient. Secondly, the assignment of a 0 to the upper branch and a 1 
to the lower (less probable) branch is arbitrary. We may simply reverse the 
assignment of a 0 and 1 and still obtain an effic iertt code satisfying the prefix 
condition. 


Example 3-3-2 

As a second example, let us determine the Huffman code for the output of a 
DMS illustrated in Fig. 3-3-6. The entropy of this source is H(X) - 
2.63 bits/symbol. The Huffman code as illustrated in Fig. 3-3-6 has an 
average length of R = 2.70 bits/symbol. Hence, its efficiency is 0.97. 


Huffman code for Example 3-3-2. 



Letter 


*2 

r, 


Code 


00 

010 

Oil 

100 

101 
no 

MIO 

mi 


W(X ) = 2.63 R = 2.70 
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The variable-length encoding (Huffman) algorithm described in the above 
examples generates a prefix code having an R that satisfies (3-3-9). However, 
instead of encoding on a symbol-by-symbol basis, a more efficient procedure is 
to encode blocks of 7 symbols at a time. In such a case, the bounds in (3-3-9) of 
source coding theorem II become 

JH(X) ^ Rj< JH(X) + 1, (3-3-12) 

since the entropy of a 7-symbol block from a DMS is JH(X), and Rj is the 
average number of bits per 7-symbol blocks. If we divide (3-3-12) by 7, we 
obtain 

H{X)^<H(X) + ~j (3-3-13) 

where Rj/J * R is the average number of bits per source symbol. Hence R can 
be made as close to H(X) as desired by selecting 7 sufficiently large. 


Example 3-3-3 

The output of a DMS consists of letters x lt x 2 , and x 3 with probabilities 0.45, 
0.35, and 0.20, respectively. The entropy of this source is H(X) — 
1.518 bits/symbol. The Huffman code for this source, given in Table 3-3-2, 
requires K, = 1.55 bits/symbol and results in an efficiency of 97.9%. If pairs 
of symbols are encoded by means of the Huffman algorithm, the resulting 
code is as given in Table 3-3-3. The entropy of the source output for pairs of 
letters is 2H[X) = 3.036 bits/symbol pair. On the other hand, the Huffman 
code requires R 2 = 3.0675 bits/symbol pair. Thus, the efficiency of the 
encoding increases to 2 H(X)fR 2 = 0.990 or, equivalently, to 99.0%. 

In summary, we have demonstrated that efficient encoding for a DMS may 
be done on a symbol-by-symbol basis using a variable-length code based on 


TABLE 3-3-2 HUFFMAN CODE FOR EXAMPLE 3-3-3 


Letter 

Probability Seif-Monnation 

Code 


0.45 1.156 

I 

*2 

0.35 1.520 

00 

*3 

0.20 2.330 

01 


H(X)=* 1.518 bits/letter 



R i - 1.55 bits/letter 
Efficiency = 97.9% 
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TABLE 3-3-3 HUFFMAN CODE FOR ENCODING PAIRS OF LETTERS 


Letter pair 

Probability 

Setf-iafonutioB 

Code 

x,x, 

0.2025 

2.312 

10 

XyX 2 

0.1575 

2 676 

001 

X 2 X , 

0.1575 

2.676 

010 

*2*2 

0.1225 

3.039 

on 


0.09 

3.486 

111 


0.09 

3.486 

0000 

XjJt, 

0.07 

3.850 

0001 

x,x 2 

0.07 

3.850 

1100 

x,x. 

0.04 

4.660 

1101 


2H(X) = 3.036 bits/letter pair 



R 2 = 3.0675 bits/ietter pair 



= 1.534 bits/letter 



Efficiency 

= 99.0% 



the Huffman algorithm. Furthermore, the efficiency of the encoding procedure 
is increased by encoding blocks of J symbols at a time. Thus, the output of a 
DMS with entropy H( X) may be encoded by a variable -length code with an 
average number of bits per source letter that approaches H{X) as closely as 
desired. 

3-3*2 Discrete Stationary Sources 

In the previous section, we described the efficient encoding of the output of a 
DMS. In this section, we consider discrete sources for which the sequence of 
output letters is statistically dependent. We limit our treatment to sources that 
are statistically stationary. 

Let us evaluate the entropy of any sequence of letters from a stationary 
source. From the definition in (3-2-13) and the result given in (3-2-15), the 
entropy of a block of random variables X ] X 2 • • ■ X k is 

k 

H(X y X 2 • • • X k ) = 2 H (X, |*,AV - i) (3-3-14) 

1=1 

where H(X, | X x X 2 ■ ■ ■ X, x ) is the conditional entropy of the ilh symbol from 
the source given the previous / - 1 symbols. The entropy per letter for the 
k-symbol block is defined as 

H k {X) = ^H(X i X 2 - ■ ■ X k ) (3-3-15) 

We define the information content of a stationary source as the entropy per 
letter in (3-3-15) in the limit as k —> «=. That is, 
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The existence of this limit is established below. 

As an alternative, we may define the entropy per letter from the source in 
terms of the conditional entropy H(X t : | X, X 2 ■ • ■ X k ,) in the limit as k 
approaches infinity. Fortunately, this limit also exists and is identical to the 
limit in (3-3-16). That is. 

H*{X) = lim H(X k j X x X 2 ■ • • **_,) (3-3-17) 

A — 

This result is also established below. Our development follows the approach in 
Gallager (1968). 

First, we show that 

H(X k \ X t X 2 - ■ ■ X k ^)^H(X k ^ \ X,X 2 - ■ ■ X^ 2 ) (3-3-18) 

for k 5*2. From our previous result that conditioning on a random variable 
cannot increase entropy, we have 

H{X k | Jf,* 2 •••**_,) * H(X k | X 2 X, • ■ • X k ^i) (3-3-19) 

From the stationarity of the source, we have 

H(X k | X 2 X , * • ■ **_,) = H(X k ^ | X t X 2 ■ • • **_ 2 ) (3-3-20) 

Hence, (3-3-18) follows immediately. This result demonstrates that 
H{X k | X x X 2 ■ • • A'*.,) is a nonincreasing sequence in k. 

Second, we have the result 


H k {X)*H( X k \X l X 2 ---X k ^) (3-3-21) 

which follows immediately from (3-3-14) and (3-3-15) and the fact that the last 
term in the sum of (3-3-14) is a lower bound on each of the other k - 1 terms. 
Third, from the definition of H k (X), we may write 

H k {X) «£[//(*,*, •••**_,) + H(X k I JT, ■•• **-,)] 

= i[(A-l)W*_ l (JT) + //(^|Ar l •••**.,)) 

^^~H k ^(X) + l -H k (X) 

which reduces to 

H k (X)*H k - y (X) (3-3-22) 

Hence, H k {X) is a nonincreasing sequence in k. 

Since H k (X) and the conditional entropy H(X k -\ X x - ■ • X k - x ) are both 
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nonnegative and nonincreasing with k, both limits must exist. Their limiting 
forms can be established by using (3-3-14) and (3-3-15) to express H k +,(X) as 

/W*)“ - ■ x k _ y ) 

k +/ 

+ | *»•••**_,) + H{X k + ] j X r - X k ) 

+ ...+H(X k+i \X i --X k + i - l )] 

Since the conditional entropy is nonincreasing, the first term in the square 
brackets serves as an upper bound on the other terms. Hence, 

H k+j (X)^-~ r .H{X l X 2 ■ ■ ■ **_,) + J ~H{X k | X,X 2 • • • AT*.,) 

(3-3-23) 

For a fixed k, the limit of (3-3-23) as j — ► » yields 

H^X)^H(X k | X,X 2 ■ • **-,) (3-3-24) 

But (3-3-24) is valid for all k; hence, it is valid for k-* ®. Therefore, 

H*(X ) « lim H(X k \ X , X 2 ■ • • X k - , ) (3-3-25) 

On the other hand, from (3-3-21), we obtain in the limit as k -> », 


//*(*) 3= lim //(** | • • • **_,) (3-3-26) 

k—* x 


which establishes (3-3-17). 

Now suppose we have a discrete stationary source' that emits / letters with 
Hj(X) as the entropy per letter. We can encode the sequence of J letters with a 
variable-length Huffman code that satisfies the prefix condition by following 
the procedure described in the previous section. The resulting code has an 
average number of bits for the /-letter block that satisfies the condition 

H{X\ - Xj )^Rj< H(X l ■ • • Xj) + 1 (3-3-27) 

By dividing each term of (3-3-27) by J, we obtain the boullls on the average 
number R = Rj/J of bits per source letter as 

Hj(X)^R<Hj(X) 4j (3-3-28) 

By increasing the block size /, we can approach Hj(X) arbitrarily closely, and 
in the limit as /-*<», R satisfies 

H,(X)^R<H*(X) + e 


(3-3-29) 
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where e approaches zero as 1 IJ. Thus, efficient encoding of stationary sources 
is accomplished by encoding large blocks of symbols into code words. We 
should emphasize, however, that the design of the Huffman code requires 
knowledge of the joint pdf for the ./-symbol blocks. 


Lempel-Ziv Algorithm 

From our preceding discussion, we have observed that the Huffman coding 
algorithm yields optimal source codes in the sense that the code words satisfy 
the prefix condition and the average block length is a minimum. To design a 
Huffman code for a DMS, we need to know the probabilities of occurrence of 
all the source letters. In the case of a discrete source with memory, we must 
know the joint probabilities of blocks of length n s* 2. However, in practice, 
the statistics of a source output are often unknown. In principle, it is possible 
to estimate the probabilities of the discrete source output by simply observing 
a long information sequence emitted by the source and obtaining the 
probabilities empirically. Except for the estimation of the marginal prob- 
abilities {p*}, corresponding to the frequency of occurrence of the individual 
source output letters, the computational complexity involved in estimating 
joint probabilities is extremely high. Consequently, the application of the 
Huffman coding method to source coding for many real sources with memory 
is generally impractical. 

In contrast to the Huffman coding algorithm, the Lempel-Ziv source coding 
algorithm is designed to be independent of the source statistics. Hence, the 
Lempel-Ziv algorithm belongs to the class of universal source coding 
algorithms. It is a variable-to-fixed-length algorithm, where the encoding is 
performed as described below. 

In the Lempel-Ziv algorithm, the sequence at the output of the discrete 
source is parsed into variable -length blocks, which are called phrases. A new 
phrase is introduced every time a block of letters from the source differs from 
some previous phrase in the last letter. The phrases are listed in a dictionary, 
which stores the location of the existing phrases. In encoding a new phrase, we 
simply specify the location of the existing phrase in the dictionary and append 
the new letter. 

As an example, consider the binary sequence 

10101101001001110101000011001110101100011011 

Parsing the sequence as described above produces the following phrases: 

1, 0, 10, 11, 01, 00, 100, HI, 010, 1000, Oil, 001, 110, 101, 10001, 1011 

We observe that each phrase in the sequence is a concatenation of a previous 
phrase with a new output letter from the source. To encode the phrases, we 
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TABLE 3-3-4 DICTIONARY FOR LEMPEL-Z1V 
ALGORITHM 



Dictionary 

location 

Dictionary 

contents 

Code 

word 

1 

0001 

1 

00001 

2 

0010 

0 

00000 

3 

0011 

10 

00010 

4 

0100 

11 

00011 

5 

0101 

(1) 

00101 

6 

0110 

00 

00100 

7 

01/1 

100 

obno 

8 

1000 

111 

01001 

9 

1001 

010 

01010 

10 

1010 

1000 

OHIO 

11 

1011 

on 

oion 

12 

1100 

001 

01101 

13 

1101 

no 

01000 

14 

mo 

101 

ooi n 

15 

mi 

10001 

10101 

16 


ion 

1 1101 


construct a dictionary as shown in Table 3-3-4. The dictionary locations are 
numbered consecutively, beginning with 1 and counting up, in this case to 16, 
which is the number of phrases in the sequence. The different phrases 
corresponding to each location are also listed, as shown. The codewords are 
determined by listing the dictionary location (in binary form) of the previous 
phrase that matches the new phrase in all but the last location. Then, the new 
output letter is appended to the dictionary location of the previous phrase. 
Initially, the location 0000 is used to encode a phrase that has not appeared 
previously. 

The source decoder for the code constructs an identical table at the 
receiving end of the communication system and decodes the received sequence 
accordingly. 

It should be observed that the table encoded 44 source bits into 16 code 
words of five bits each, resulting in 80 coded bits. Hence, the algorithm 
provided no data compression at all. However, the inefficiency is due to the 
fact that the sequence we have considered is very short. As the sequence is 
increased in length, the encoding procedure becomes more efficient and results 
in a compressed sequence at the output of the source. 

How do we select the overall length of the table? In general, no matter how 
large the table is, it will eventually overflow. To solve the overflow problem, 
the source encoder and source decoder must agree to remove phrases from the 
respective dictionaries that are not useful and substitute new phrases in their 
place. 
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The Lempel-Ziv algorithm is widely used in the compression of computer 
files. The “compress” and “uncompress” utilities under the UNIX® operating 
system and numerous algorithms under the MS-DOS operating system are 
implementations of various versions of this algorithm. 

3-4 CODING FOR ANALOG SOURCES— OPTIMUM 
QUANTIZATION 

As indicated in Section 3-1, an analog source emits a message waveform jc(/) 
that is a sample function of a stochastic process X(t). When A'(r) is a 
bandlimited, stationary stochastic process, the sampling theorem allows us to 
represent X(t) by a sequence of uniform samples taken at the Nyquist rate. 

By applying the sampling theorem, the output of an analog source is 
converted to an equivalent discrete-time sequence of samples. The samples are 
then quantized in amplitude and encoded. One type of simple encoding is to 
represent each discrete amplitude level by a sequence of binary digits. Hence, 
if we have L levels, we need R = log 2 L bits per sample if L is a power of 2, or 
R = Llog 2 Zj + 1 if L is not a power of 2. On the other hand, if the levels are 
not equally probable, and the probabilities of the output levels are known, we 
may use Huffman coding (also called entropy coding) to improve the efficiency 
of the encoding process. 

Quantization of the amplitudes of the sampled signal results in data 
compression but it also introduces some distortion of the waveform or a loss of 
signal fidelity. The minimization of this distortion is considered in this section. 
Many of the results given in this section apply directly to a discrete-time, 
continuous amplitude, memoryless gaussian source. Such a source serves as a 
good model for the residual error in a number of source coding methods 
described in Section 3-5. 


3-4-1 Rate-Distortion Function 

Let us begin the discussion of signal quantization by considering the distortion 
introduced when the samples from the information source are quantized to a 
fixed number of bits. By the term “distortion,” we mean some measure of the 
difference between the actual source samples {**} and the corresponding 
quantized values x k , which we denote by d{x k , jt*j. For example, a commonly 
used distortion measure is the squared-error distortion, defined as 

d(x k ,x k ) = (x k - x k ) 2 (3-4- 1 ) 

which is used to characterize the quantization error in PCM in Section 3-5-1. 
Other distortion measures may take the general form 

d{x k ,x k ) = \x-x k \ p (3-4-2) 

where p takes values from the set of positive integers. The case p = 2 has the 
advantage of being mathematically tractable. 
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If d(x k ,x k ) is the distortion measure per letter, the distortion between a 
sequence of n samples X„ and the corresponding n quantized values X„ is the 
average over the n source output samples, i.e., 

(/(X,„ X„) = - 2 d(x k , x k ) (3-4-3) 

w * = i 

The source output is a random process, and, hence, the n samples in X„ are 
random variables. Therefore, d{x„, X„) is a random variable. Its expected 
value is defined as the distortion D, i.e., 


D - E[d(X„, XJ] = - X E[d(x k , x*)l = E[d(x, f )] (3-4-4) 


where the last step follows from the assumption that the source output process 
is stationary. 

Now suppose we have a memoryless source with a continuous-amplitude 
output X that has a pdf p(x), a quantized amplitude output alphabet X, and a 
per letter distortion measure d(x, x), where x e X and isX. Then, the 
minimum rate in bits per source output that is required to represent the output 
X of the memoryless source with a distortion less than or equal to D is called 
the rate -distortion function R(D) and is defined as 


R(D) = min /(X, X) 

p(.?|.v):£(rf(X.X)|«Z> 


(3-4-5) 


where /(X; X) is the average mutual information between X and X. In general, 
the rate R(D ) decreases as D increases or. conversely, R(D) increases as D 
decreases. 

One interesting model of a continuous-amplitude, memoryless information 
source is the gaussian source model. In this case, Shannon proved the following 
fundamental theorem on the rate-distortion function. 


Theorem: Rate-Distortion Function for a Memoryless Gaussian Source 
(Shannon, 1959a) 


The minimum information rate necessary to represent the output of a 
discrete-time, continuous-amplitude memoryless gaussian source based on a 
mean-square-error distortion measure per symbol (single letter distortion 
measure) is 



log 2 (c t 2 JD ) 


(0 ^D^a 2 x ) 
W > al) 


(3-4-6) 


where a] is the variance of the gaussian source output. 
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FIGURE 3-4-1 


Rate distortion function for a continuous-amplitude memoryless 
gaussian source. 



We should note that (3-4-6) implies that no information need be transmitted 
when the distortion D 3* <r£. Specifically, D - a* can be obtained by using 
zeros in the reconstruction of the signal. For D > a\, we can use statistically 
independent, zero-mean gaussian noise samples with a variance of D - a] for 
the reconstruction. R g (D) is plotted in Fig. 3-4-1. 

The rate distortion function R(D) of a source is associated with the 
following basic source coding theorem in information theory. 


Theorem: Source Coding with a Distortion Measure (Shannon, 1959a) 

There exists an encoding scheme that maps the source output into code 
words such that for any given distortion D, the minimum rate R(D) bits per 
symbol (sample) is sufficient to reconstruct the source output with an 
average distortion that is arbitrarily close to D. 

It is clear, therefore, that the rate distortion function R(D) for any source 
represents a lower bound on the source rate that is possible for a given level of 
distortion. 

Let us return to the result in (3-4-6) for the rate distortion function of a 
memoryless gaussian source. If we reverse the functional dependence between 
D and R, we may express D in terms of R as 

D g {R) = 2 m a 2 x (3-4-7) 

This funcion is called the distortion-rate function for the discrete-time, 
memoryless gaussian source. 

When we express the distortion in (3-4-7) in dB, we obtain 

10 iogto D g (R) = -6 R + 10 log 10 oi (3-4-8) 

Note that the mean square distortion decreases at a rate of6dB/bit. 

Explicit results on the rate distortion functions for memoryless non-gaussian 
sources are not available. However, there are useful upper and lower bounds 
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on the rate distortion function for any discrete-time, continuous-amplitude, 
memoryless source. An upper bound is given by the following theorem. 


Theorem: Upper Bound on R{D) 

The rate-distortion function of a memoryless, continuous-amplitude source 
with zero mean and finite variance <r; with respect to the mean-square-error 
distortion measure is upper bounded as 

/?(£>) * i log 2 ^ (0 (3-4-9) 

A proof of this theorem is given by Berger (1971). It implies that the 
gaussian source requires the maximum rate among all other sources for a 
specified level of mean square distortion. Thus, the rate distortion R{D) of any 
continuous-amplitude, memoryless source with zero mean and finite variance 
satisfies the condition R{D) ^ R^D). Similarly, the distortion-rate function 
of the same source satisfies the condition 

D(R)*ZD S (R) = 2 2 V (3-4-10) 

A lower bound on the rate-distortion function also exists. This is called the 
Shannon lower bound for a mean-square-error distortion measure, and is given 
as 

R*(D) - H{X) - Uog 2 2neD (3-4-11) 

where H(X) is the differential entropy of the continuous-amplitude, memory- 
less source. The distortion-rate function corresponding to (3-4-11) is 

D*(R) = ^—2 ~ 2|/f ,HX)] (3-4-12) 

2ne 

Therefore, the rate-distortion function for any continuous-amplitude, memory- 
less source is bounded from above and below as 

R*(D)^R{D)^R g (D) (3-4-13) 

and the corresponding distortion-rate function is bounded as 

D*(R)^D(R)*zD g (R) (3-4-14) 

The differential entropy of the memoryless gaussian source is 

H g (X) = 5 log 2 Ixecr] (3-4-15) 


so that the lower bound R*(D) in (3-4-11) reduces to R K (D). Now, if we 
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express D*(R) in terms of decibels and normalize il by setting a; = 1 [or 
dividing D*(R) by o-;}, we obtain from (3-4-12) 


10 log,,, D*{R) = -6 R - 6(//,(A') - H(X)] (3-4-16) 

or. equivalently, 

10 l ° g,(, ^(f) = 6[Hli(X) “ HiX)] dB 

= 6[/?*(P) -R*(D)} dB (3-4-17) 

The relations in (3-4-16) and (3-4-17) allow us to compare the lower bound in 
the distortion with the upper bound which is the distortion for the gaussian 
source. We note that D*(R) also decreases at -6dB/bit. We should also 
mention that the differential entropy H{X ) is upper-bounded by H^X), as 
shown by Shannon (1948b). 

Table 3-4-1 lists four pdfs that are models commonly used for source signal 
distributions. The table shows the differential entropies, the differences in rates 
in bits/sample, and the difference in distortion between the upper and lower 
bounds. Note that the gamma pdf shows the greatest deviation from the 
gaussian. The Laplacian pdf is the most similar to the gaussian, and the 
uniform pdf ranks second of the pdfs shown in the table. These results provide 
some benchmarks on the difference between the upper and lower bounds on 
distortion and rate. 

Before concluding this section, let us consider a band-limited gaussian 
source with spectral density 



(I/I « wo 
(I/I > wo 


(3-4-18) 


When the output of this source is sampled at the Nyquist rate, the samples are 
uncorrelated and. since the source is gaussian, they are also statistically 


TABLE 3-4-1 DIFFERENTIAL ENTROPIES AND RATE DISTORTION COMPARISONS OF FOUR 
COMMON PDFs FOR SIGNAL MODELS 


pdf 

p(x) 

H(X) 

R e iD)-R*{D) 

(bits/sample) 

D t (R)-D*(R) 

(dB) 

Gaussian 

_J_ „ v^.rf 
y/Th(T t 

5 log 2 (2 W t ) 

0 

0 

Uniform 


(logj(12<r-;) 

0.255 

1.53 

Laplacian 

1 .-V2J.IA*, 

Via, 

2 log 2 (2e 2 o\) 

0.104 

062 

Gamma 

- - V3 L* |/2«t t 

5 log, (4^ O J2 V;/3) 

0.709 

4.25 

V8/ra t Lt| 


TABLE 3-4-3 



CHAPTER 3: SOURCE CODING 113 


independent. Hence, the equivalent discrete-time gaussian source is memory- 
less. The rate -distortion function for each sample is given by (3-4-6). 
Therefore, the rate-distortion function for the band-limited white gaussian 
source in bits/s is 

R g (D)=W log 2 ^ (OssD=£o^) (3-4-19) 

The corresponding distortion-rate function is 

D R (R)=2-*'"a 2 x (3-4-20) 

which, when expressed in decibels and normalized by o 2 x , becomes 

10 log D g (R)/<r 2 x = -3R/W (3-4-21) 

The more general case in which the gaussian process is neither white nor 
band-limited has been treated by Gallager (1968) and Goblick and Holsineer 
(1967). 


3-4-2 Scalar Quantization 

In source encoding, the quantizer can be optimized if we know the probability 
density function of the signal amplitude at the input to the quantizer. For 
example, suppose that the sequence {*„} at the input to the quantizer has a pdf 
p(x) and let L- 2* be the desired number of levels. We wish to design the 
optimum scalar quantizer that minimizes some function of the quantization 
error q — x — x, where x is the quantized value of x. To elaborate, suppose that 
f(x-x) denotes the desired function of the error. Then, the distortion 
resulting from quantization of the signal amplitude is 


D = 



-x)p(x)dx 


(3-4-22) 


In general, an optimum quantizer is one that minimizes D by optimally 
selecting the output levels and the corresponding input range of each output 
level. This optimization problem has been considered by Lloyd (1982) and Max 
(1960), and the resulting optimum quantizer is usually called the Lloyd-Max 
quantizer. 

For a uniform quantizer, the output levels are specified as x k = \(2k - 1)A, 
corresponding to an input signal amplitude in the range (k - 1)A *£.v < kA, 
where A is the step size. When the uniform quantizer is symmetric with an 
even number of levels, the average distortion in (3-4-22) may be expressed as 

U 2-1 rXX 

^ = 2 X /(lilk - I)A - x)p(x)dx 

k = 1 J (k - I )& 

+ 2 [ /(2(2/c - 1)A - x)p(x) dx 

J [L! 2-l>A 


(3-4-23) 



114 DIGITAL COMMl Nl< A1IONS 


TABLE 3-4-2 OPTIMUM STEP SIZES FOR UNIFORM QUANTIZATION OF A 
GAUSSIAN RANDOM VARIABLE 


Number of 
output levels 

Optimum step 
size A„p, 

Minimum VISE 

A- 

lOlOgAnin 

(dB) 

2 

1.5% 

0.3634 

-4.4 

4 

0.99S7 

0.1188 

-9.25 

8 

0.5860 

0.03744 

-14.27 

lb 

0.3352 

0.01154 

-19.38 

32 

0.1881 

0.00349 

-2A.57 


la this case, the minimization of D is carried out with respect to the step-size 
parameter <1. By differentiating D with respect to A, we obtain 

1.12-1 rk\ 

2 (2A-I) f{k(2k-\)A-x)p(x)dx 

A - I A A - I >4 

+ (/--!)[ f'(' 2 (L-\)A-x)p(x)dx = 0 (3-4-24) 

J-U-l 2 1)4 

where /'( x) denotes the derivative of/(.v). 

By selecting the error criterion function f{x), the solution of (34-24) for the 
optimum step size can be obtained numerically on a digital computer for any 
given pdf p(x). For the mean-square-error criterion, for which f (x) = x 2 , Max 
(1960) evaluated the optimum step size A op , and the minimum mean square 
error when the pdf p(x) is zero-mean gaussian with unit variance. Some of 
these results are given in Table 3-4-2, We observe that the minimum mean 
square distortion D min decreases by a little more than 5dB for each doubling of 
the number of levels L. Hence, each additional bit that is employed in a 
uniform quantizer with optimum step size A opt for a gaussian-distributed signal 
amplitude reduces the distortion by more than 5 dB. 

By relaxing the constraint that the quantizer be uniform, the distortion can 
be reduced further. In this case, we let the output level be x =x k when the 

input signal amplitude is in the range jc fc _ j ssx <x k . For an L-level quantizer, 

the end points are x 0 = -« and x L = The resulting distortion is 

L rt t 

^=2 f{x k -x)p(x)dx (3-4-25) 

* = 1 J *i i 

which is now minimized by optimally selecting the {**} and (**}. 

The necessary conditions for a minimum distortion are obtained by 
differentiating D with respect to the {**} and {Jr*}. The result of this 
minimization is the pair of equations 

/(** -*k) =/(**,, -**), k = l,2,...,L-l (3-4-26) 

f(x k - x)p{x)dx = 0, k ~ 1,2, . . . , L (3-4-27) 
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TABLE 3-4-3 OPTIMUM FOUR-LEVEL 

QUANTIZER FOR A GAUSSIAN 
RANDOM VARIABLE 


Level 

k x k 


1 

-0,9816 

-1.510 

2 

0.0 

■0.4528 

3 

0.9816 

04528 

4 

DC 

1.510 


0mm —0.1 175 



10 log D mm = -9.3 dB 



As a special case, we again consider minimizing the mean square value of 
the distortion. In this case, f(x) = x 2 and, hence, (3-4-26) becomes 

-** = \{x k + ***,), A: = 1,2 L- 1 (3-4-28) 

which is the midpoint between x k and x k + l . The corresponding equations 
determining {jt*} are 

{x k -x)p(x)dx = 0, k- 1,2 i (3-4-29) 

Thus, x k is the centroid of the area of p(x) between x k and x k . These 
equations may be solved numerically for any given p(x). 

Tables 3-4-3 and 3-4-4 give the results of this optimization obtained by Max 


TABLE 3-4-4 OPTIMUM EIGHT-LEVEL 

QUANTIZER FOR A GAUSSIAN 
RANDOM VARIABLE (MAX, 1960) 


Level k 

x k 

h 

1 

-1.748 

-2.152 

2 

-1.050 

-1.344 

3 

-0.5006 

-0.7560 

4 

0 

-0.2451 

5 

0.5006 

0.2451 

6 

1.050 

0.7560 

7 

1.748 

1344 

8 

DC 

2152 


0mm = 0.03454 
10 log D mi „= -14.62 dB 
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TABLE 3*4-$ COMPARISON OF OPTIMUM UNIFORM AND 

NONUNIFORM QUANTIZERS FOR A GAUSSIAN 
RANDOM VARIABLE (MAX, I960; PAEZ AND 
GLJSSON, 1972) 


R 

(bits/sample) 

10log 10 £>^ , 

Uniform (dB) 

Nonwnifonu (dB) 

1 

-4.4 

-4.4 

2 

-9.25 

-9.30 

3 

- 14,27 

-14.62 

4 

-19.38 

-20.22 

5 

-24.57 

-26.02 

6 

-29.83 

-31.89 

7 

-35,13 

-37.81 


(1960) for the optimum four-level and eight-level quantizers of a gaussian 
distributed signal amplitude having zero mean and unit variance. In Table 
3-4-5, we compare the minimum mean square distortion of a uniform quantizer 
to that of a nonuniform quantizer for the gaussian-distributed signal amplitude. 
From the results of this table, we observe that the difference in the 
performance of the two types of quantizers is relatively small for small values 
of R (less than 0.5 dB for R « 3), but it increases as R increases. For example, 
at R=5, the nonuniform quantizer is approximately 1.5 dB better than the * 
uniform quantizer. 

It is instructive to plot the minimum distortion as a function of the bit rate 
R = log 2 L bits per source sample (letter) for both the uniform and nonuniform 
quantizers. These curves are illustrated in Fig. 3-4-2. The functional depen- 
dence of the distortion D on the bit rate R may be expressed as D(R), the 
distortion-rate function. We observe that the distortion-rate function for the 
optimum nonuniform quantizer falls below that of the optimum uniform 
quantizer. 

Since any quantizer reduces a continuous amplitude source into a discrete 
amplitude source, we may treat the discrete amplitude as letters, say 
X = {•**. 1 with associated probabilities {p k }. If the signal ampli- 

tudes are statistically independent, the discrete source is memoryless and, 
hence, its entropy is 

L 

= l°gz Pk (3-4-30) 

A«1 

For example, the optimum four-level nonuniform quantizer for the 
gaussian-distributed signal amplitude results in the probabilities p, = p 4 = 
0.1635 for the two outer levels and P 2 = p? = 0.3365 for the two inner levels. 
The entropy for the discrete source is H{X)*= 1.911 bits/letter. Hence, with 
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FIGURE 3-4-2 


TABLE 3-4-6 



Distortion versus rate curves for discrete-time memory less gaussiar source. 


entropy coding (Huffman coding) of blocks of output letters, we can achieve 
the minimum distortion of -9.30dB with 1.911 bits/letter instead of 
2 bits/letter. Max (1960) has given the entropy for the discrete source letters 
resulting from quantization. Table 34-6 lists the values of the entropy for the 
nonuniform quantizer. These values are also plotted in Fig. 3-4-2 and labeled 
entropy coding. 

From this discussion, we conclude that the quantizer can be optimized when 
the pdf of the continuous source output is known. The optimum quantizer of 
L- 2" levels results in a minimum distortion of /)(/?), where R = logy L 


ENTROPY OF THE OUTPUT OF AN OPTIMUM 
NONUNIFORM QUANTIZER FOR A GAUSSIAN 
RANDOM VARIABLE (MAX, 1960) 


R 

(hits/sample) 

Entropy 

(bits/letter) 

Distortion 
10 log, n D mfl 

I 

1.0 

-4.4 

2 

1.91 i 

-9.30 

3 

2.825 

-14.62 

4 

3.765 

-20.22 

5 

4.730 

-26.02 
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bits/sampie. Thus, this distortion can be achieved by simply representing each 
quantized sample by R bits. However, more efficient encoding is possible. The 
discrete source output that results from quantization is characterized by a set 
of probabilities {p k } that can be used to design efficient variable-length codes 
for the source output (entropy coding). The efficiency of any encoding method 
car be compared with the distortion-rate function or, equivalently, the 
rate-distortion function for the discrete-time, continuous-amplitude source that 
is characterized by the given pdf. 

If we compare the performance of the optimum nonuniform quantizer with 
the distortion-rate function, we find, for example, that at a distortion of 
-26 dB, entropy coding is 0.41 bits/sample more than the minimum rate given 
by (3-4-8), and simple block coding of each letter requires 0.68 bits/sample 
more than the minimum rate. We also observe that the distortion rate 
functions for the optimal uniform and nonuniform quantizers for the gaussian 
source approach the slope of -6 dB/bit asymptotically for large R. 


3-4-3 Vector Quantization 

In the previous section, we considered the quantization of the output signal 
from a continuous-amplitude source when the quantization is performed on a 
sample-by-sample basis, i.e., by scalar quantization. In this section, we consider 
the joint quantization of a block of signal samples or a block of signal 
parameters. This type of quantization is called block or vector quantization. It 
is widely used in speech coding for digital cellular systems. 

A fundamental result of rate-distortion theory is that better performance 
can be achieved by quantizing vectors instead of scalars, even if the 
continuous-amplitude source is memoryless. If, in addition, the signal samples 
or signal parameters are statistically dependent, we can exploit the dependency 
by jointly quantizing blocks of samples or parameters and, thus, achieve an 
even greater efficiency (lower bit rate) compared with that which is achieved 
by scalar quantization. 

The vector quantization problem may be formulated as follows. We have an 
n-dimensional vector X = [xi x 2 ••• x n \ with real-valued, continuous- 
amplitude components {x k , that are described by a joint pdf 

P(x\,x 2 , . . . ,x n ). The vector X is quantized into another n -dimensional vector 
X with components {x k , \ k n}. We express the quantization as Q( ), 
so that 

X = Q(X) (3-4-31) 

where X is the output of the vector quantizer when the input vector is X. 

Basically, vector quantization of blocks of data may be viewed as a pattern 
recognition problem involving the classification of blocks of data into a discrete 
number of categories or cells in a way that optimizes some fidelity criterion, 
such as mean square distortion. For example, let us consider the quantization 
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FIGURE 3-4-3 


An example ol quantization in two-dimensional space. 


of two-dimensional vectors X = [.*i .xsj. The two-dimensional space is 
partitioned into cells as illustrated in Fig. 3-4-3, where we have arbitrarily 
selected hexagonal-shaped cells {C*}. All input vectors that fall in cell C k are 
quantized into the vector X*. which is shown in Fig. 3-4-3 as the center of the 
hexagon. In this example, there are L = 37 vectors, one for each of the 37 cells 
into which the two-dimensional space has been partitioned. We denote the set 
of possible output vectors as {X*, 1 =£ k *£ L). 

In general, quantization of the n-dimensional vector X into an »?- 
dimensional vector X introduces a quantization error or a distortion r/(X, X). 
The average distortion over the set of input vectors X is 

L 

D * 2 ^(X e C\)E(rf(X,X*) | X e Q] 

A - \ 

= ^P(XeC k )f d(X, X k )p(X) dX (3-4-32) 

A I A,C, 

where P(X e C k ) is the probability that the vector X falls in the cell C\ and 
p(X) is the joint pdf of the n random variables. As in the case of scalar 
quantization, we can minimize D by selecting the cells {C*. 1 ^ k ^ L} for a 
given pdf p(X). 

A commonly used distortion measure is the mean square error (T norm) 
defined as 

<P(X, X) = - (X - X)'(X - X) = - 2) U k -x k ) 2 (3-4-33) 

n n ^ = i 

or, more generally, the weighted mean square error 

d 2W (X, X) = (X - X)'W(X - X) (3-4-34) 

where W is a positive-definite weighting matrix. Usually. W is selected to be 
the inverse of the covariance matrix of the input data vector X. 
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Other distortion measures that are sometimes used are special cases of the l p 
norm defined as 

rf,(X,X) = -£ (3-4-35) 

n 4 = 1 


The special case p = 1 is often used as an alternative to p = 2. 

Vector quantization is not limited to quantizing a block of signal samples of 
a source waveform. It can also be applied to quantizing a set of parameters 
extracted from the data. For example, in linear predictive coding (LPC), 
described in Section 3-5-3, the parameters extracted from the signal are the 
prediction coefficients, which are the coefficients in the all-pole filter model for 
the source that generates the observed data. These parameters can be 
considered as a block and quantized as a block by application of some 
appropriate distortion measure. In the case of speech encoding, an appropriate 
distortion measure, proposed by Itakura and Saito (1968, 1975), is the 
weighted square error where the weighting matrix W is selected to be the 
normalized autocorrelation matrix <l» of the observed data. 

In speech processing, an alternative set of parameters that may be quantized 
as a block and transmitted to the receiver is the set of reflection coefficients 
(a/,, 1 i *£ m}. Yet another set of parameters that is sometimes used for vector 
quantization in linear predictive coding of speech comprises the log-area ratios 
K}, which are defined in terms of the reflection coefficients as 


. 1 + a kk 

r k - log' . l^k^m (3-4-36) 

t a kk 

Now, let us return to the mathematical formulation of vector quantization 
and let us consider the partitioning of the n -dimensional space into L cells 
{Q, 1 so that the average distortion is minimized over all L-level 

quantizers. There are two conditions for optimality. The first is that the 
optimal quantizer employs a nearest-neighbor selection rule, which may be 
expressed mathematically as 

<2(X) = X* 

if and only if 

£>(X,X*)«D(X,X;), k*j, l^j^L (3-4-37) 

The second condition necessary for optimality is that each output vector X* be 
chosen to minimize the average distortion in cell C k . In other words, X* is the 
vector in C k that minimizes 


D k = E[d(X, X) | X e C*j = f </(X, X)p(X) dX (3-4-38) 

The vector X k that minimizes D k is called the centroid of the cell. Thus, these 
conditions for optimality can be applied to partition the n-dimensional space 
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into cells (C*. 1 «*«£} when the joint pdf p(X) is known. It is clear that 
these two conditions represent the generalization of the optimum scalar 
quantization problem to the n -dimensional vector quantization problem. In 
general, we expect the code vectors to be closer together in regions where the 
joint pdf is large and farther apart in regions where p(X) is small. 

As an upper bound on the distortion of a vector quantizer, we may use the 
distortion of the optimal scalar quantizer, which can be applied to each 
component of the vector as described in the previous section. On the other 
hand, the best performance that can be achieved by optimum vector 
quantization is given by the rate-distortion function or, equivalently, the 
distortion-rate function. 

The distortion-rate function, which was introduced in the previous section, 
may be defined in the context of vector quantization as follows. Suppose we 
form a vector X of dimension n from n consecutive samples {*,„}. The vector X 
is_ then quantized to form X = Q(X), where X is a vector from the set of 
{X*, 1 «£«£}. As described above, the average distortion D resulting from 
representing X by X is £[d(X, X)], where d(X, X) is the distortion per 
dimension, e.g., 

</(X,X) = -£(*,- **) 2 

n * = i 


The vectors {X*, 1 *£/c L} can be transmitted at an average bit rate of 


R = 


H(X) 

n 


bits/sample 


(3-4-39) 


where H{X) is the entropy of the quantized source output defined as 

L 

(X) = -2 p(X,) log 2 />(*,) (3-4-40) 

l» 1 

For a given average rate R, the minimum achievable distortion D„(R) is 


D n (R) = min E[d(X,X)] (3-4-41) 

C<X) 

where R^H(X)/n and the minimum in (3-4-41) is taken over all possible 
mappings £?(X). In the limit as the number of dimensions n is allowed to 
approach infinity, we obtain 


D(R) = lim D„(R) (3-4-42) 

n — *oe 

where D(R) is the distortion-rate function that was introduced in the previous 
section. It is apparent from this development that the distortion-rate function 
can be approached arbitrarily closely by increasing the size n of the vectors. 

The development above is predicated on the assumption that the joint pdf 
p(X) of the data vector is known. However, in practice, the joint pdf p(X) of 
the data may not be known. In such a case, it is possible to select the 
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quantized output vectors adaptively from a set of training vectors X(m). 
Specifically, suppose that we are given a set of M training vectors where M is 
much greater tiian L (A/ » L). An iterative clustering algorithm, called the K 
means algorithm, where in our case K = L, can be applied to the training 
vectors. This algorithm iteratively subdivides the M training vectors into L 
clusters such that the two necessary conditions for optimality are satisfied. The 
K means algorithm may be described as follows [Makhoui et al. (1985)}. 

K Means Algorithm 

Step 1 Initialize by setting the iteration number i = 0. Choose a set of 
output vectors X*(0), 1 ^ k « L. 

Step 2 Classify the training vectors 1 m =£ M} into the clusters 

{C*} by applying the nearest-neighbor rule 

X e C*(<) iff D(X, X*(/))=eZ>(X, X y (/)) for all k * j 

Step 3 Recompute (set r to / + 1) the output vectors of every cluster by 
computing the centroid 

M')4 E X(/«), l^k^L 

Xtf, 

of the training vectors that fall in each cluster. Also, compute the 
resulting distortion D(i ) at the ith iteration. 

Step 4 Terminate the test if the change D(i - \)-D(i ) in the average 
distortion is relatively small. Otherwise, go to Step 2. 

The K means algorithm converges to a local minimum (see Anderberg, 
1973; Linde et al., 1980). By beginning the algorithm with different sets of 
initial output vectors (X*(0)} and each time performing the optimization 
described in the K means algorithm, it is possible to find a global optimum. 
However, the computational burden of this search procedure may limit (he 
search to a few initializations. 

Once we have selected the output vectors (X*. 1 « k *£ L}, each signal vector 
X(m) >s quantized to the output vector that is nearest to it according to the 
distortion measure that is adopted. If the computation involves evaluating 
the distance between X(m) and each of the L possible output vectors {X*}, the 
procedure constitutes a full search. If we assume that each computation 
requires n multiplications and additions, the computational requirement for a 
full search is 

« * nL (3-4-43) 

multiplication and additions per input vector. 

If we select L to be a power of 2 then Jog 2 L is the number of bits required 
to represent each vector. Now, if R denotes the bit rate per sample [per 
component or dimension of X(m)), we have nR = log 2 L, and, hence, the 
computational cost is 


<€ = nT * 


(3-4-44) 
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FIGURE 3-4-4 


Note that the number of computations grows exponentially with the dimen 
sionality parameter n and the bit rate R per dimension. Because of this 
exponential increase of the computational cost, vector quantization has been 
applied to low-bit-source encoding, such as coding the reflection coefficients or 
log area ratios in LPC. 

The computational cost associated with full search can be reduced bv 
slightly suboptimum algorithms (see Change/ al., 1984: Gersho, 1982). 

In order to demonstrate the benefits of vector quantization compared with 
scalar quantization, we present the following example taken from Makhoul et 
at. (1985). 


Example 3-4-1 


Let x, and x 2 be two random variables with a uniform joint pdf 


1 


p(X|,.r,)=p(X) = 5 


cb 


0 


(X e C) 
(otherwise) 


(3-4-45) 


where C is the rectangular region illustrated in Fig. 3-4-4. Note that the 
rectangle is rotated by 45° relative to the horizontal axis. Also shown in Fig. 
3-4-4 are the marginal densities p(x,) and p(x 2 ). 


A uniform pdf m two dimensions. (Makhoul el al., 1985 . ) 

*■» 
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If we quantize jc, and x 2 separately by using uniform intervals of length A, 
the number of levels needed is 




a + b 

V2A 


(3-4-46) 


Hence, the number of bits needed for coding the vector X = [*i x 2 ] is 


R X = R ] + R 2 = log 2 L, + log 2 L 2 


R x = log 2 


(a + b) 2 
2A 2 


(3-4-47) 


Thus, scalar quantization of each component is equivalent to vector 
quantization with the total number of levels 

L x = L\L 2 = ^ r- (3-4-48) 


We observe that this approach is equivalent to covering the large square 
that encloses the rectangle by square cells, where each cell represents one of 
the L x quantized regions. Since p{\) = 0 except for X e C, this encoding is 
wasteful and results in an increase of the bit rate. 

If we were to cover only the region for which p(X) ^ 0 with squares 
having area A 2 , the total number of levels that will result is the area of the 
rectangle divided by A 2 , i.e., 

U = ~ 2 (3-4-49) 

Therefore, the difference in bit rate between the scalar and vector 
quantization methods is 

R x -R’ = lo g2 (a + ^ (3-4-50) 

Lab 

For instance, if a = 4b, the difference in bit rate is 

R x ~ R x = 1.64bits/vector 

Thus, vector quantization is 0.82 bits/sample better for the same distortion. 


It is interesting to note that a linear transformation (rotation by 45°) will 
decorrelate and x 2 and render the two random variables statistically 
independent. Then scalar quantization and vector quantization achieve the 
same efficiency. Although a linear transformation can decorrelate a vector of 
random variables, it does not result in statistically independent random 
variables, in general. Consequently, vector quantization will always equal or 
exceed the performance of scalar quantization (see Problem 3-40). 

Vector quantization has been applied to several types of speech encoding 
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methods including both waveform and model-based methods which are treated 
in Section 3-5. In model-based methods such as LPC, vector quantization has 
made possible the coding of speech at rates below 1000 bits/s (see Buzo et al., 
1980; Roucos et al, 1982; Paul 1983). When applied to waveform encoding 
methods, it is possible to obtain good quality speech at 16 000 bits/s, or, 
equivalently, at R = 2 bits/sample. With additional computational complexity, 
it may be possible in the future to implement waveform encoders producing 
good quality speech at a rate of R = 1 bit/sample. 


3-5 CODING TECHNIQUES FOR ANALOG SOURCES 

A number of coding techniques for analog sources have been developed over 
the past 40 years. Most of these have been applied to the encoding of speech 
and images. In this section, we briefly describe several of these methods and 
use speech encoding as an example in assessing their performance. 

It is convenient to subdivide analog source encoding methods into thiee 
types. One type is called temporal waveform coding. In this type of encoding, 
the source encoder is designed to represent digitally the temporal characteris- 
tics of the source waveform. A second type of source encoding is spectral 
waveform coding. The signal waveform is usually subdivided into different 
frequency bands, and either the time waveform in each band or its spectral 
characteristics are encoded for transmission. The third type of source encoding 
is based on a mathematical model of the source and is called model-based 
coding. 


3-5-1 Temporal Waveform Coding 

There are several analog source coding techniques that are designed to 
represent the time-domain characteristics of the signal. The most commonly 
used methods are described in this section. 

Pulse Code Moduiationt (PCM) Let .c(r) denote a sample function 
emitted by a source and let x„ denote the samples taken at a sampling rate 
f^2W, where IT is the highest frequency in the spectrum of x(t). In PCM, 
each sample of the signal is quantized to one of 2* amplitude levels, where R is 
the number of binary digits used to represent each sample. Thus the rate from 
the source is Rf bits/s. 

The quantization process may be modeled mathematically as 

x„=x „+q„ (3-5-1) 

where x„ represents the quantized value of x„ and q„ represents the 
quantization error, which we treat as an additive noise. Assuming that a 

t PCM. DPCM, and ADPCM are source coding techniques. They are not digital modulation 
methods. 
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FIGURE 3-5-1 Input-output characteristic for a uniform quantizer. 


uniform quantizer is used, having the input-output characteristic illustrated in 
Fig. 3-5-1, the quantization noise is well characterized statistically by the 
uniform pdf 

= -iA^qssiA (3-5-2) 

where the step size of the quantizer is A = 2 R . The mean square value of the 
quantization error is 

£(?’) = = nX2 2R (3-5-3) 

Measured in decibels, the mean square value of the noise is 

10 log ^A 2 = 10 log (ti x 2 2R ) = -6 R - 10.8 dB (3-5-4) 

We observe that the quantization noise decreases by 6 dB/bit used in the 
quantizer. For example, a 7 bit quantizer results in a quantization noise power 
of -52.8 dB. 

Many source signals such as speech waveforms have the characteristic that 
small signal amplitudes occur more frequently than large ones. Flowever. a 
uniform quantizer provides the same spacing between successive levels 
throughout the entire dynamic range of the signal. A better approach is to 
employ a nonuniform quantizer. A nonuniform quantizer characteristic is 
usually obtained by passing the signal through a nonlinear device that 
compresses the signal amplitude, followed by a uniform quantizer. For 
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FIGURE 3-5-2 


Input-output magnitude characteristic for a 
logarithmic compressor. 



ui 


example, a logarithmic compressor has an inpul-output magnitude 
characteristics of the form 


log (I + n |.r|) 
log ( 1 + m) 


(3-5-5) 


where |.t| «£ 1 is the magnitude of the input, |y| is the magnitude of the output, 
and g is a parameter that is selected to give the desired compression 
characteristic. Figure 3-5-2 illustrates this compression relationship for several 
values of p.. The value ft = 0 corresponds to no compression. 

In the encoding of speech waveforms, for example, the value of /a = 255 has 
been adopted as a standard in the USA and Canada. This value results in 
about a 24 dB reduction in the quantization noise power relative to uniform 
quantization, as shown by Jayant (1974). Consequently, a 7 bit quantizer used 
in conjunction with a p = 255 logarithmic compressor produces a quantization 
noise power of approximately -77 dB compared with the —53 dB for uniform 
quantization. 

In the reconstruction of the signal from the quantized values, the inverse 
logarithmic relation is used to expand the signal amplitude. The combined 
compressor-expandor pair is termed a compandor. 


Differential Puke Code Modulation (DPCM) In PCM, each sample of 
the waveform is encoded independently of all the others. However, most 
source signals sampled at ' the Nyquist rate or faster exhibit significant 
correlation between successive samples. In other words, the average change in 
amplitude between successive samples is relatively small. Consequently, an 
encoding scheme that exploits the redundancy in the samples will result in a 
lower bit rate for the source output. 

A relatively simple solution is to encode the differences between successive 
samples rather than the samples themselves. Since differences between samples 
are expected to be smaller than the actual sampled amplitudes, fewer bits are 
required to represent the differences. A refinement of this general approach is 
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to predict the current sample based on the previous p samples. To be specific, 
let x„ denote the current sample from the source and let x„ denote the 
predicted value of x,„ defined as 

x n a,x„-j (3-5-6) 

1 = 1 

Thus is a weighted linear combination of the past p samples and the {a,} are 
the predictor coefficients. The {a,} are selected to minimize some function of 
the error between x„ and x„. 

A mathematically and practically convenient error function is the mean 
square error (MSE). With the MSE as the performance index for the predictor, 
we select the {a,} to minimize 

% = E(e;,) = - £ a,x„_,j J 

= E {x 2 „) -2% a l E(x„x„-,) + 2 ^ a,a,E(x„- ,x n _ t ) (3-5-7) 

i=l 1=1 

Assuming that the source output is (wide-sense) stationary, we may express 
(3-5-7) as 

% = 0(0) - 2 ^ ^ ^ - j) (3-5-8) 

(=1 f=l /=--! 

where is the autocorrelation function of the sampled signal sequence x n . 
Minimization of % p with respect to the predictor coefficients {a,} results in the 
set of linear equations 

£ a, 0(i -j) = 0( j), j = 1, 2 p (3-5-9) 

Thus, the values of the predictor coefficients are established. When the 
autocorrelation function 4>{n) is not known a priori, it may be estimated from 
the samples {*„} using the relation! 

| N-n 

<K«) S x, +n , n-0,1,2 p (3-5-10) 

/V j^ x 

and the estimate 0(«) is used in (3-5-9) to solve for the coefficients {a,}. Note 
that the normalization factor of l/N in (3-5-10) drops out when 0 (h) is 
substituted in (3-5-9). 

The linear equations in (3-5-9) for the predictor coefficients are called the 
normal equations or the Yule -Walker equations. There is an algorithm 
developed by Levinson (1947) and Durbin (1959) for solving these equations 
efficiently. It is described in Appendix A. We shall deal with the solution in 
greater detail in the subsequent discussion on linear predictive coding. 


f The estimation of the autocorrelation function from a finite number of observations fr,} is a 
separate issue, which is beyond the scope of this discussion. The estimate in (3-5-10) is one that is 
frequently used in practice. 
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FIGURE 3-5-3 
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= q„ = quantization error 
(nl Encoder 


Predictor 


To low past filter 


ib) Decoder 

( a ) Block diagram of a DPCM encoder. ( b ) DPCM decoder at the receiver. 


Having described the method for determining the predictor coefficients, let 
us now consider the block diagram of a practical DPCM system, shown in Fig. 
3-5-3(a). In this configuration, the predictor is implemented with the feedback 
loop around the quantizer. The input to the predictor is denoted by x„, which 
represents the signal sample x„ modified by the quantization process, and the 
output of the predictor is 


The difference 


= 2 j ,~i 

i=i 

(3-5-11) 


(3-5-12) 


is the input to the quantizer and e„ denotes the output. Each value of the 
quantized prediction error e„ is encoded into a sequence of binary digits and 
transmitted over the channel to the destination. The quantized error e„ is also 
added to the predicted value x n to yield x„. 

At the destination, the same predictor that was used at the transmitting end 
is synthesized and its output i„ is added to S n to yield x n . The signal is the 
desired excitation for the predictor and also the desired output sequence from 
which the reconstructed signal Jt(f) is obtained by filtering, as shown in Fig. 
3-5-3(f>). 

The use of feedback around the quantizer, as described above, ensures that 
the error in x n is simply the quantization error q n = e„- e„ and that there is no 
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FIGURE 3*5-4 


accumulation of previous quantization errors in the implementation of the 
decoder. That is, 

& n (^n -t rt ) 

= x„-x„ (3-5-13) 

Hence x„ = x„ + q n . This means that the quantized sample x„ differs from the 
input x„ by the quantization error q„ independent of the predictor used. 
Therefore, the quantization errors do not accumulate. 

In the DPCM system illustrated in Fig. 3-5-3, the estimate or predicted 
value of the signal sample x n is obtained by taking a linar combination of 
past values i„- t , k = 1, 2 , . . . ,p, as indicated by (3-5-11). An improvement in 
the quality of the estimate is obtained by including linearly filtered past values 
qf the quantized error. Specifically, the estimate may be expressed as 

p m 

L = £ + x (3-5-14) 

I- 1 

where {6,} are the coefficients of the filter for the quantized error sequence e„. 
The block diagrams of the encoder af the transmitter and the decoder at the 
receiver are shown in Fig. 3-5-4. The two sets of coefficients {a,} and {i» f } are 
selected to minimize some function of the error e„ -x„ such as the mean 
square error. 


DPCM modified by the addition of linearly filtered error sequence. 
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FIGURE 3-5-5 


Adaptive PCM and DPCM Many real sources are quasistationary in 
nature. One aspect of the quasistationary characteristic is that the variance and 
the autocorrelation function of the source output vary slowly with time. PCM 
and DPCM encoders, however, are designed on the basis that the source 
output is stationary. The efficiency and performance of these encoders can be 
improved by having them adapt to the slowly time-variant statistics of the 
source. 

In both PCM and DPCM, the quantization error q„ resulting from a uniform 
quantizer operating on a quasistationary input signal will have a time-variant 
variance (quantization noise power). One improvement that reduces the 
dynamic range of the quantization noise is the use of an adaptive quantizer. 
Although the quantizer can be made adaptive in different ways, a relatively 
simple method is to use a uniform quantizer that varies its step size in 
accordance with the variance of the past signal samples. For example, a 
short-term running estimate of the variance of x n can be computed from the 
input sequence {*„} and the step size can be adjusted on the basis of such an 
estimate. In its simplest form, the algorithm for the step-size adjustment 
employs only the previous signal sample. Such an algorithm has been 
successfully used by Jayant (1974) in the encoding of speech signals. Figure 
3-5-5 illustrates such a (3 bit) quantizer in which the step size is adjusted 
recursively according to the relation 

A„+ , = A„M(n) (3-5-15) 


Example of a quantizer with an adaptive step size. ( Jayant , 1974.) 



Previous 

output 

Multiplier 


Input 
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TABLE 3-5-1 MULTIPLICATION FACTORS FOR ADAPTIVE STEP SIZE 
ADJUSTMENT (JAYANT, 1974) 




PCM 



DPCM 


2 

3 

4 

2 

3 

4 

M ( l ) 

0.60 

0.85 

0.80 

0.80 

0.90 

0.90 

M ( 2) 

2. 2 D 

1.00 

0.80 

1.60 

0.90 

0.90 

Af( 3) 


1.00 

0.80 


1.25 

0.90 

M ( 4) 


1.50 

0.80 


1.70 

0.90 

M { 5) 



1.20 



1.20 

AJ (6) 



1.60 



1.60 

M ( 7 ) 



2.00 



2.00 

«(8) 



2.40 



2.40 


where M(n) is a factor, whose value depends on the quantizer level for the 
sample x„, and A„ is the step size of the quantizer for processing x„. Values of 
the multiplication factors optimized for speech encoding have been given by 
Jayant (1974). These values are displayed in Table 3-5-1 for 2, 3, and 4 bit 
adaptive quantization. 

In DPCM, the predictor can also be made adaptive when the source output 
in quasistationary. The coefficients of the predictor can be changed periodically 
to reflect the changing signal statistics of the source. The linear equations given 
by (3-5-9) still apply, with the short-term estimate of the autocorrelation 
function of x„ substituted in place of the ensemble correlation function. The 
predictor coefficients thus determined may be transmitted along with the 
quantized error e(n) to the receiver, which implements the same predictor. 
Unfortunately, the transmission of the predictor coefficients results in a higher 
bit rate over the channel, offsetting, in part, the lower data rate achieved by 
having a quantizer with fewer bits (fewer levels) to handle the reduced 
dynamic range in the error e„ resulting from adaptive prediction. 

As an alternative, the predictor at the receiver may compute its own 
prediction coefficients from e,, and x„, where 


= e„ + 2, a t x„- 


i- 1 


(3-5-16) 


If we neglect the quantization noise, x„ is equivalent to x n . Hence, x n may be 
used to estimate the autocorrelation function at the receiver, and the 
resulting estimates can be used in (3-5-9) in place of 4>(n) to solve for the 
predictor coefficients. For sufficiently fine quantization, the difference between 
x„ and x„ is very small. Hence, the estimate of 4>(n) obtained from x„ is usually 
adequate for determining the predictor coefficients. Implemented in this 
manner, the adaptive predictor results in a lower source data rate. 

Instead of using the block processing approach for determining the 
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FIGURE 3-5-6 
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(a) Block diagram of a delta modulation syslem. (b) An equivalent realization of a delta 
modulation system. 


predictor coefficients {a,} as described above, we may adapt the predictor 
coefficients on a sample-by-sample basis by using a gradient-type algorithm, 
similar in form to the adaptive gradient equalization algorithm that is described 
in Chapter 11. Similar gradient-type algorithms have also been devised for 
adapting the filter coefficients {a,} and {6,} of the DPCM system shown in Fig. 
3-5-4. For details on such algorithms, the reader may refer to the book by 
Jayant and Noll (1984). 


Delta Modulation (DM) Delta modulation may be viewed as a simplified 
form of DPCM in which a two-level (1 bit) quantizer is used in conjunction 
with a fixed first-order predictor. The block diagram of a DM encoder-decoder 
is shown in Fig. 3-5-6 (a). We note that 


(3-5-17) 
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FIGURE 3.5-7 


Since 

Qn 

(^n *^rt) 

It follows that 

x„ =X n -i + 9„_1 

Thus the estimated (predicted) value of x„ is really the previous sample 
modified by the quantization noise q n ~\. We also note that the difference 
equation (3-5-17) represents an integrator with an input e„. Hence, an 
equivalent realization of the one-step predictor is an accumulator with an input 
equal to the quantized error signal i„. In general, the quantized error signal is 
scaled by some value, say A u which is called the step size. This equivalent 
realization is illustrated in Fig. 3-5-6 (b). In effect, the encoder shown in Fig. 
3-5-6 approximates a waveform x(t) by a linear staircase function. In order for 
the approximation to be relatively good, the waveform Jt(f) must change slowly 
relative to the sampling rate. This requirement implies that the sampling rate 
must be several (a factor of at least 5) times the Nyquist rate. 

At any given sampling rate, the performance of the DM encoder is limited 
by two types of distortion, as illustrated in Fig. 3-5-7. One is called 
slope -overload distortion. It is due to the use of a step size A, that is too small 
to follow portions of the waveform that have a steep slope. The second type of 
distortion, called granular noise, results from using a step size that is too large 
in parts of the waveform having a small slope. The need to minimize both of 
these two types of distortion results in conflicting requirements in the selection 
of the step size A t . One solution is to select A t to minimize the sum of the 
mean square values of these two distortions. 

Even when A x is optimized to minimize the total mean square value of the 
slope-overload distortion and the granular noise, the performance of the DM 
encoder may still be less than satisfactory. An alternative solution is to employ 
a variable step size that adapts itself to the short-term characteristics of the 
source signal. That is, the step size is increased when the waveform has a steep 


An example of slope overload distortion 
and granular noise in a delta modulation 
encoder. 


Granular noise 

Jf<r> 
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FIGURE 3*5-8 



slope and decreased when the waveform has a relatively small slope. This 
adaptive characteristic is illustrated in Fig. 3-5-8. 

A variety of methods can be used to adaptively set the step size in every 
iteration. The quantized error sequence e„ provides a good indication of the 
slope characteristics of the waveform being encoded. When the quantized error 
e„ is changing signs between successive iterations, this is an indication that the 
slope of the waveform in that locality is relatively small. On the other hand, 
when the waveform has a steep slope, successive values of the error e„ are 
expected to have identical signs. From these observations, it is possible to 
devise algorithms that decrease or increase the step size depending on 
successive values of e n . A relatively simple rule devised by Jayant (1970) is to 
adaptively vary the step size according to the relation 

A„ = £„_,**■*« >, n — 1, 2, . . . 

where K > 1 is a constant that is selected to minimize the total distortion. A 
block diagram of a DM encoder-decoder that incorporates this adaptive 
algorithm is illustrated in Fig. 3-5-9. 

Several other variations of adaptive DM encoding have been investigated 
and described in the technical literature. A particularly effective and popular 
technique first proposed by Greefkes (1970) is called continuously variable 
slope delta modulation (CVSD). In CVSD the adaptive step-size parameter 
may be expressed as 

ot A„ - 1 + Ar j 

^ *»* &n-\* e /f _ 2 have the same sign; otherwise, 

A„ Of A„ . | k 2 

The parameters a, k,, and k 2 are selected such that 0< a < 1 and /t, » k 2 > 0 
For more discussion on this and other variations of adaptive DM, the 

!?If^f ted L reader is referred to the Papers by Jayant (1974) and Flanagan 'et at. 
(1979), which contain extensive references. 
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FIGURE 3-5-9 An example of a delta modulation system with adaptive step size. 


PCM, DPCM, adaptive PCM, and adaptive DPCM and DM are all source 
encoding techniques that attempt to faithfully represent the output waveform 
from the source. The following class of waveform encoding methods is based 
on a spectral decomposition of the source signal. 


3-5-2 Spectral Waveform Coding 

In this section, we briefly describe waveform coding methods that filter the 
source output signal into a number of frequency bands or subbands and 
separately encode the signal in each subband. The waveform encoding may be 
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performed either on the time-domain waveforms in each subband or on the 
frequency-domain representation of the corresponding time-domain waveform 
in each subband. 


Subband Coding !n subband coding (SBC) of speech and image signals, 
the signal is divided into a small number of subbands and the time waveform in 
each subband is encoded separately. In speech coding, for example, the 
lower-frequency bands contain most of the spectral energy in voiced speech. In 
addition, quantization noise is moic noticeable to the ear in the lower- 
frequency bands. Consequently, more bits are used for the lower-band signals 
and fewer are used for the higher-frequency bands. 

Filter design is particularly important in achieving good performance in 
SBC. In practice, quadrature-mirror filters (QMFs) are generally used because 
they yield an alias-free response due to their perfect reconstruction property 
(see Vaidyanathan, 1993). By using QMFs in subband coding, the lower- 
frequency band is repeatedly subdivided by factors of two, thus creating 
octave-band filters. The output of each QMF filler is decimated by a factor of 
two, in order to reduce the sampling rale. For example, suppose that the 
bandwidth of a speech signal extends to 3200 Hz. The first pair of QMFs 
divides the spectrum into the low (0-1600 Hz) and high (1600-3200 Hz) bands. 
Then, the low band is split into low (0-800 Hz) and high (800-1600 Hz) bands 
by the use of another pair of QMFs. A third subdivision by another pair of 
QMFs can split the 0-800 Hz band into low (0-400 Hz) and high (400-800 Hz) 
bands. Thus, with three pairs of QMFs, we have obtained signals in the 
frequency bands 0-400, 400-800, 800-1600 and 1600-3200 Hz. The time- 
domain signal in each subband may now be encoded with different precision. 
In practice, adaptive PCM has been used for waveform encoding of the signal 
in each subband. 


Adaptive Transform Coding In adaptive transform coding (ATC), the 
source signal is sampled and subdivided into frames of N f samples, and the data 
in each frame is transformed into the spectral domain for coding and 
transmission. At the source decoder, each frame of spectral samples is 
transformed back into the time domain and the signal is synthesized from the 
time-domain samples and passed through a D/A converter. To achieve coding 
efficiency, we assign more bits to the more important spectral coefficients and 
fewer bits to the less important spectral coefficients. In addition, by designing 
an adaptive allocation in the assignment of the total number of bits to the 
spectral coefficients, we can adapt to possibly changing statistics of the source 
signal. 

An objective in selecting the transformation from the time domain to the 
frequency domain is to achieve uncorrelated spectral samples. In this sense, the 
Karhunen-Lodve transform (KLT) is optimal in that it yields spectral values 
that are uncorrelated, but the KLT is generally difficult to compute (see 
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Wintz, 1972). The DFT and the discrete cosine transform (DCT) are viable 
alternatives, although they are suboptimum. Of these two, the DCT yields 
good performance compared with the KLT, and is generally used in practice 
(see Campanella and Robinson, 1971; Zelinsky and Noll, 1977). 

In speech coding using ATC, it is possible to attain communication-quality 
speech at a rate of about 9600 bits/s. 


3-5-3 Model- Based Source Coding 

In contrast to the waveform encoding methods described above, model-based 
source coding represents a completely different approach. In this, the source is 
modeled as a linear system (filter) that, when excited by an appropriate input 
signal, results in the observed source output. Instead of transmitting the 
samples of the source waveform to the receiver, the parameters of the linear 
system are transmitted along with an appropriate excitation signal. If tile 
number of parameters is sufficiently small, the model-based melhods provide a 
large compression of the data. 

The most widely used model-based coding method is called linear predictive 
coding (LPC). In this, the sampled sequence, denoted by x„, n = 0, 1, . . . , N - 
1, is assumed to have been generated by an all-pole (discrete-time) filter 
having the transfer function 

H(z) (3-5-18) 

l - £ a k z~ k 
*= i 


Appropriate excitation functions are an impulse, a sequence of impulses, or a 
sequence of white noise with unit variance. In any case, suppose that the input 
sequence is denoted by v„, n = 0, 1, 2, ... . Then the output sequence of the 
all-pole model satisfies the difference equation 

**= % a k x„- k +Gv„, n= 0,1,2,... (3-5-19) 

In general, the observed source output x„, n = 0, 1, 2, . . . , N - 1, does not 
satisfy the difference equation (3-5-19), but only its model does. If the input is 
a white-noise sequence or an impulse, we may form an estimate (or prediction) 
of x„ by the weighted linear combination 


^ a k x n - k , n> 0 

i 


The difference between x n and i„, namely, 

e„ = x n -i„ 




3kX n -k 


*- 1 


(3-5-20) 


(3-5-21) 
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represents the error between the observed value x„ and the estimated 
(predicted) value x, r The filter coefficients {a*} can be selected to minimize the 
mean square value of this error. 

Suppose for the moment that the input {u,,} is a white-noise sequence. Then, 
the filter output .v„ is a random sequence and so is the difference e„ = x„ - x„. 
The ensemble average of the squared error is 


%, = E{e 2 „) 


= E 


* r \ - 

( x n - 2 A 

V * I ' 


~ 0(0) -2± a k 4>{k ) +22 < ha„,<t>{k - m) (3-5-22) 


k ] 


k ~ 1 m - 1 


where 4>(m) is the autocorrelation function of the sequence x n , n = 

0,1 N - 1. But E r is identical to the MSE given by (3-5-X) for a predictor 

used in DPCM. Consequently, minimization of ~i p in (3-5-22) yields the set of 
normal equations given previously by (3-5-9). To completely specify the filter 
H(z), we must also determine the filter gain G. From (3-5-19), we have 


£|(Ci^-> G 2 £(u;) = G : = J(.r,, 



(3-5-23) 


where f p is the residual MSE obtained from (3-5-22) by substituting the 
optimum prediction coefficients, which result from the solution of (3-5-9). With 
this substitution, the expression for E r and, hence, G 2 simplifies to 


= G 2 = 0(0) - 2 a*4>(k) (3-5-24) 

In practice, we do not usually know a priori the true autocorrelation 
function of the source output. Hence, in place of we substitute an 

estimate j>(n) as given by (3-5-10), which is obtained from the set of samples 
•v„, n =0. 1 A’ - 1, emitted by the source. 

As indicated previously, the Levinson-Durbin algorithm derived in Appen- 
dix A may be used to solve for the predictor coefficients {«*} recursively, 
beginning with a first-order predictor and iterating the order of the predictor 
up to order p. The recursive equations for the {aj may be expressed as 


4>(i) ~ u <b(i ~ k) 

k i 

= J / = 2, 3, . . 

e>, . | 

“ik = a, u - 1, i . 1 « k *£ j - 1 


■ ’P 


flu = -T- 


00 ) 

0(0) 


(3-5-25) 


^ 0 = 0 ( 0 ) 
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FIGURE 3-5-10 


where a ik , k = 1, 2, . . . , i, are the coefficients of the /th-order predictor. The 
desired coefficients for the predictor of order p are 

ak^apk, k = \,2,...,p (3-5-26) 

and the residual MSE is 

%=G 2 = 4>(0)-f,a k $(k) 

*=•1 

= $(0)ft(l-«5) (3-5-27) 

i=i 

We observe that the recursive relations in (3-5-25) give us not only the 
coefficients of the predictor for order p, but also the predictor coefficients of all 
orders less than p. 

The residual MSE %, i -l, 2 ,p, forms a monotone decreasing se- 
quence, i.e. *£ . . . % *£ & 0 , and the prediction coefficients a,, satisfy 

the condition 

k«l<l, 1 = 1,2, ...,p (3-5-28) 

This condition is necessary and sufficient for all the poles of H(z) to be inside 
the unit circle. Thus (3-5-28) ensures that the model is stable. 

LPC has been successfully used in the modeling of a speech source. In this 

case, the coefficients a ih i = 1, 2 p, are called reflection coefficients as a 

consequence of their correspondence to the reflection coefficients in the 
acoustic tube model of the vocal tract (see Rabiner and Schafer, 1978; Deller et 
a/., 1993). 

Once the predictor coefficients and the gain G have been estimated from the 
source output {*„}, each parameter is coded into a sequence of binary digits 
and transmitted to the receiver. Source decoding or waveform synthesis may 
be accomplished at the receiver as illustrated in Fig. 3-5-10. The signal 
generator is used to produce the excitation function {u„}, which is scaled by G 


Block diagram of a waveform synthesizer (source decoder) for an LPC system 



Output 
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FIGURE 3-5-11 



Speech 

sigml 


Pitch 

period !//„ 


Block diagram model of the generation of a speech signal. 


to produce the desired input to the all-pole filter model H(z) synthesized from 
the received prediction coefficients. The analog signal may be reconstructed by 
passing the output sequence from H(z) through an analog filter that basically 
performs the function of interpolating the signal between sample points. In this 
realization of the waveform synthesizer, the excitation function and the gain 
parameter must be transmitted along with the prediction coefficients to the 
receiver. 

When the source output is stationary, the filter parameters need to be 
determined only once. However, the statistics of most sources encountered in 
practice are at best quasistationary. Under these circumstances, it is necessary 
to periodically obtain new estimates of the filter coefficients, the gain G, and 
the type of excitation function, and to transmit these estimates to the receiver. 


Example 3*5*1 

The block diagram shown in Fig. 3-5-11 illustrates a model for a speech 
source. There are two mutually exclusive excitation functions to model 
voiced and unvoiced speech sounds. On a short-time basis, voiced speech is 
periodic with a fundamental frequency ^ or a pitch period 1 ff 0 that depends 
on the speaker. Thus voiced speech is generated by exciting an all-pole filter 
model of the vocal tract by a periodic impulse train with a period equal to 
the desired pitch period. Unvoiced speech sounds are generated by exciting 
the all-pole filter model by the output of a random-noise generator. The 
speech encoder at the transmitter must determine the proper excitation 
function, the pitch period for voiced speech, the gain parameter G, and the 
prediction coefficients. These parameters are encoded into binary digits and 
transmitted to the receiver. Typically, the voiced and unvoiced information 
requires 1 bit, the pitch period is adequately represented by 6 bits, and the 
gain parameter may be represented by 5 bits after its dynamic range is 
compressed logarithmically. The prediction coefficients require 8- 
lObits/coefficient for adequate representation (see Rabiner and Schafer, 
1978). The reason for such high accuracy is that relatively small changes in 
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FIGURE 3-5-12 



Outptl! 


All-pole lattice filter for synthesizing the speech signal. 


the prediction coefficients result in a large change in the pole positions of 
the filter model H(z). The accuracy requirements may be lessened by 
transmitting the reflection coefficients a„, which have a smaller dynamic 
range. These are adequately represented by 6 bits. Thus, for a predictor of 
order p = 10 (five poles in the total number of bits is 72. Due to the 

quasistationary nature of the speech signal, the linear system model must be 
changed periodically, typically once every 15-30 ms. Consequently, the bit 
rate from the source encoder is in the range 4800-2400 bit/s. 

When the reflection coefficients are transmitted to the decoder, it is not 
necessary to recompute the prediction coefficients in order to realize the 
speech synthesizer. Instead, the synthesis is performed by realizing a lattice 
filter, shown in Fig. 3-5-12. which utilizes the reflection coefficients directly and 
which is equivalent to the linear prediction filter. 

The linear all-pole filter model, for which the filter coefficients are estimated 
via linear prediction, is by far the simplest linear model for a source. A more 
general source model is a linear filter that contains both poles and zeros. In a 
pole-zero model, the source output x„ satisfies the difference equation 

x n = ^ a k x n - k + i M*-* 

k= 1 k =0 

where v n is the input excitation sequence. The problem now is to estimate the 
filter parameters {«*} and {b k } from the data x h i - 0, 1, .... IV - 1, emitted by 
the source. However, the MSE criterion applied to the minimization of the 
error e„ = x„ - x n , where x„ is an estimate of x„, results in a set of nonlinear 
equations for the parameters {a*} and {£*}■ Consequently, the evaluation of the 
{a*} and {b k } becomes tedious and difficult mathematically. To avoid having to 
solve the nonlinear equations, a number of suboptimum methods have been 
devised for pole-zero modeling. A discussion of these techniques would lead 
us too far afield, however. 

LPC as described above forms the basis for more complex model-based 
source encoding methods. When applied to speech coding, the model-based 
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methods are generally called vocoders (for voice coders). In addition to the 
conventional LPC vocoder described above, other types of vocoders that have 
been implemented include the residual excited LPC (RELP) vocoder, the 
multipulse LPC vocoder, the code-excited LPC (CELP) vocoder, and the 
vector-sum-excited LPC (VSELP) vocoder. The CELP and VSELP vocoders 
employ vector-quantized excitation codebooks to achieve communication 
quality speech at low bit rates. 

Before concluding this section, we consider the application of waveform 
encoding and LPC to the encoding of speech signals and compare the bit rates 
of these coding techniques. 

Encoding Methods Applied to Speech Signals The transmission of speech 
signals over telephone lines, radio channels, and satellite channels constitutes 
by far the largest part of our daily communications. It is understandable, 
therefore, that over the past three decades more research has been performed 
on speech encoding than on any other type of information-bearing signal. In 
fact, all the encoding techniques described in this section have been applied to 
the encoding of speech signals. It is appropriate, therefore, to compare the 
efficiency of these methods in terms of the bit rate required to transmit the 
speech signal. 

The speech signal is assumed to be band-limited to the frequency range 
200-3200 Hz and sampled at a nominal rate of 8000 samples/s for all encoders 
except DM, where the sampling rate is f s identical to the bit rate. For an LPC 
encoder, the parameters given in Example 3-5-1 are assumed. 

Table 3-5-2 summarizes the main characteristics of the encoding methods 
described in this section and the required bit rate. In terms of the quality of the 
speech signal synthesized at the receiver from the (error-free) binary sequence, 
all the waveform encoding methods (PCM, DPCM, ADPCM, DM, ADM) 
provide telephone (toll) quality speech. In other words, a listener would have 
difficulty discerning the difference between the digitized speech and the analog 
speech waveform. ADPCM and ADM are particularly efficient waveform 
encoding techniques. With CVSD, it is possible to operate down to 9600 bits/s 


TABLE 3-5-2 ENCODING TECHNIQUES APPLIED TO SPEECH SIGNALS 


Encoding method 

Quantizer 

Coder 

Transmission rate (bits/s) 

PCM 

Linear 

12 bits 

96000 

Log PCM 

Logarithamic 

7-8 bits 

56 000-64 000 

DPCM 

Logarithmic 

4-6 bits 

32 000-48 000 

ADPCM 

Adaptive 

3-4 bits 

24 000-32 000 

DM 

Binary 

1 bit 

32 000-64 000 

ADM 

Adaptive binary 

1 bit 

16 000-32 000 

LPC 



2400-4800 


4* 
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with some noticeable waveform distortion. In fact, at rates below 16000 bits/s, 
the distortion produced by waveform encoders increases significantly. Conse- 
quently, these techniques are not used below 9600 bits/s. 

For rates below 9600 bits/s, encoding techniques, such as LPC, that are 
based on linear models of the source are usually employed. The synthesized 
speech obtained from this class of encoding techniques is intelligible. However, 
the speech signal has a synthetic quality and there is noticeable distortion. 


3-6 BIBLIOGRAPHICAL NOTES AND REFERENCES 

Source coding has been an area of intense research activity since the 
publication of Shannon's classic papers in 1948 and the paper by Huffman 
(1952). Over the years, major advances have been made in the development of 
highly efficient source data compression algorithms. Of particular significance 
is the research on universal source coding and universal quantization published 
by Ziv (1985), Ziv and Lempel (1977, 1978), Davisson (1973), Gray (1975), and 
Davisson et al. (1981). 

Treatments of rate distortion theory are found in the books by Gallager 
(1968). Berger (1971), Viterbi and Omura (1979), Blahut (1987) and Gray 
(1990). 

Much work has been done over the past several decades on speech encoding 
methods. Our treatment provides an overview of this important topic. A more 
comprehensive treatment is given in the books by Rabiner and Schafer (1978), 
Jayant and Noll (1984), and Deller et at. (1993). In addition to these texts, 
there have been special issues of the IEEE Transactions on Communications 
(April 1979 and April 1982) and, more recently, the IEEE Journal on Selected 
Areas in Communications (February 1988) devoted to speech encoding. We 
should also mention the publication by IEEE Press of a book containing 
reprints of published papers on waveform quantization and coding, edited by 
Jayant (1976). 

Over the past decade, we have also seen a number of important develop- 
ments in vector quantization. Our treatment of this topic was based on the 
tutorial paper by Makhoul et al. (1985). A comprehensive treatment of vector 
quantization and signal compression is provided in the book by Gersho and 
Gray (1992). 


PROBLEMS 

3-1 Consider the joint experiment described in Problem 2-1 with the given joint 
probabilities P(A„ B,). Suppose we observe the outcomes A„ / = 1,2, 3, 4 of 
experiment A. 

a Determine the mutual information I(B r A,) for / = 1, 2, 3 and / = 1,2,3, 4, in 
bits. 

b Determine the average mutual information 
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3-2 Suppose the outcomes B r j = 1, 2. 3. in Problem 3-1 represent the three possible 
output letters from ihe DMS. Determine the entropy of the source. 

3-3 Prove that In u « u - 1 and also demonstrate the validity of this inequality by 
plotting In u and u — 1 on the same graph. 

3-4 X and Y are two discrete random variables with probabilities 

P(X =x,Y = y)^P(x,y) 

Show that 1{X\Y)^Q, with equality if and only if X and Y are statistically 
independent, 

[Hint: Use the inequality In n < w — 1 , for 0<« <1, to show that — /( X\ F)=sO.] 

3-5 The output of a DMS consists of the possible letters j:,.*, x„, which occur 

with probabilities p,,p 2 p„, respectively. Prove that the entropy H(X) of the 

source is at most log n. 

3-6 Determine the differential entropy H(X) of the uniformly distributed random 
variable X with pdf 

JtT 1 (C«jr=sa) 

P lO (otherwise) 

for the following three cases: 
a a ~ 1; 
b a - 4; 
c a = I 

Observe from these results that H( X) is not an absolute measure, but only a 
relative measure of randomness. 

3-7 A DMS has an alphabet of eight letters, jr„ / = 1, 2, . . . , 8, with probabilities 0.25 
0.20, 0.15, 0.12, 0.10, 0.08, 0.05, and 0.05. 

a Use the Huffman encoding procedure to determine a binary code for the source 
output. 

b Determine the average number R of binary digits per source letter, 
c Determine the entropy of the source and compare it with R. 

3-8 A DMS has an alphabet of five letters, jr„ i = 1,2 5, each occurrmg with 

probability Evaluate the efficiency of a fixed-length binary code in which 
a each letter is encoded separately into a binary sequence: 
b two letters at a time are encoded into a binary sequence; 
c three letters at a time are encoded into a binary sequence. 

3-9 Recall (3-2-6): 

/(x,;y / ) = /(3r,)-/(x,|y / ) 

Prove that 

a I (x,\y) = l(y ) )- I(y, I*,); 

b l(*Ay) = I( x .) + I[y,) ~ l{x,y,), where /(jr,y,) = -log P( x„ y t ). 

3-10 Let A" be a geometrically distributed random variable; that is, 

p(X = k) = p(l - p) k * = 1,2,3,... 

a Find the entropy of X. 

b Knowing that X > K, where X is a positive integer, what is the entropy of X° 
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3-11 Let X and Y denote two jointly distributed discrete valued random variables, 
a Show that 


H{X) = -^P(x,y)\ogP{x) 

x.y 

H(Y) ~ P(x. y) log Ply) 


b Use the above result to show that 

H{X , K) «//(*) + H(Y) 


When does equality hold? 
c Show that 

H(X | Y)^H(X) 

with equality if and only if X and Y are independent. 

3-12 Two binary random variables X and V are distributed according to the joint 
distributions p{X = Y = Q)=p(X ~0, Y = 1 )=p(X = Y = l) = i Compute H(X), 
H{Y). H(X | Y), H(Y \ X), and H(X, Y). 

3-13 A Markov process is a process with one-step memory, i.e., a process such that 

Pi x „ |-v„ : ,x„ 3 . ■ ■ )=p(x„ | x„ ,) 

for ail n. Show that, for a stationary Markov process, the entropv rate is given by 
H(X „ ! X„ 

3-14 Let Y=g(X). where g denotes a deterministic function. Show that, in general, 
H{Y) *^H{X). When does equality hold? 

3-15 Show that l(X: Y) = H(X) + H(Y) - H(XY). 

3-16 Show that, for statistically independent events. 


tf = 2 //(*,) 

i i 

3-17 For a noiseless channel, show' that H( X j T) =0. 

3-18 Show that 

l(X,: X; \X,) = H( X: I A',) - H(X, \ X,X : ) 

and that 

H(X } \X x )»H{X y \X x Xd 

3-19 Let X be a random variable with pdf p v (.t) and let Y = aX + b be a linear 
transformation of X. where a and b are two constants. Determine the differential 
entropy H(Y) in terms of H(X). 

3-20 The outputs x x , v,. and .Vj of a DMS with corresponding probabilities =0.45, 
Pi = 0.35, and p 3 = 0.20 are transformed by the linear transformation Y = aX + b, 
where a and b are constants. Determine the entropy H( Y) and comment on whai 
effect the transformation has had on the entropy of X 

3-21 The optimum four-level nonuniform quantizer for a gaussian-distributed signal 
amplitude results in the four levels a x , «... o 3 . and u 4 , with corresponding 
probabilities of occurrence p, ~ p : =0.3365 and p y = /) 4 =0.1635. 
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FIGURE P3-22 



Pu.u,) 


a Design a Huffman code that encodes a single level at a time and determine the 
average bit rate. 

b Design a Huffman code that encodes two output levels at a time and determine 
the average bit rate. 

c What is the minimum Tate obtained by encoding J output levels at a time as 
/->«? 

3-22 A first-order Markov source is characterized by the state probabilities P(x,), 
i = 1,2, and the transition probabilities P(x k \x i ), and 

k * i. The entropy of the Markov source is 

= t P(x k )H(X\x k ) 

where H(X j x t ) is the entropy conditioned on the source being in state x k . 

Determine the entropy of the binary, first-order Markov source shown in Fig. 
P3-22, which has the transition probabilities P(x l | *,) = 0.2 and />(*, | jr 2 ) = 0.3. 
[Note that the conditional entropies H(X | jr,) and H(X | x 2 ) are given by the 
binary entropy functions H[P(x 2 | jt,)] and #[/*(*, | -r 2 )J, respectively.] How does 
the entropy of the Markov source compare with the entropy of a binary DMS with 
the same output letter probabilities /’(jt,) and P(jc 2 )? 

3-23 A memoryless source has the alphabet sd = {-5, -3, -1,0, 1,3,5}, with corre- 
sponding probabilities {0.05, 0.1, 0.1, 0.15, 0.05, 0.25, 0.3}. 
a Find the entropy of the source. 

b Assuming that the source is quantized according to the quantization rule 

?(-5) = fl<-3)«4 
?(-l) = 9 (0) = <7(l) = 0 
<7(3) = 9 {5) =4 

find the entropy of the quantized source. 

3-24 Design a ternary Huffman code, using 0, 1, and 2 as letters, for a source with 
output alphabet probabilities given by {0.05,0.1,0.15, 0.17,0.18, 0.22,0.13}. What 
is the resulting average codeword length? Compare the average codeword length 
with the entropy of the source. (In what base would you compute the logarithms in 
the expression for the entropy for a meaningful comparison?) 

3-25 Find the Lempel-Ziv source code for the binary source sequence 

0001001000000110000100000001000000101000010000001 10100000001 100 

Recover the original sequence back from the Lempel-Ziv source code. 

[Hint: You require two passes of the binary sequence to decide on the size of the 
dictionary.] 

3-26 Find the differential entropy of the continuous random variable X in the following 

cases: 
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a X is an exponential random variable with parameter A >0, i.e., 

[K~'e~ xU (x >0) 


fAx) = 


10 (otherwise) 

b X is a Laplacian random variable with parameter A >0, i.e.. 


c X is a triangular random variable with parameter A>0, i.e.. 


fAx ) = 


{x + A)/A J 
' (—x+ A)/A s 


10 


(—A =e0) 

(0<jt «A) 
(otherwise) 


3-27 It can be shown that the rate-distortion function for a Laplacian source, 
fx(x) = (2\y'e l ' l/A with an absolute value of error-distortion measure d(x, x) = 
[x - Jc\ is given by 

S<D| 'l0 (£)>A) 

(see Berger, 1971). 

a How many bits per sample are required to represent the outputs of this source 
with an average distortion not exceeding ^A? 
b Plot R(D) for three different values of A and discuss the effect of changes in A 
on these plots. 

3-28 It can be shown that if X is a zero-mean continuous random variable with variance 
cr 2 , its rate distortion function, subject to squared 1 error distortion measure, 
satisfies the lower and upper bounds given by the inequalities 


h(X) - 5 \og 2 neD R(D) *£ \ log ko 1 

where h (X) denotes the differential entropy of the random variable X (see Cover 
and Thomas, 1991). 

a Show that, for a Gaussian random variable, the lower and upper bounds 
coincide. 

b Plot the lower and upper bounds for a Laplacian source with <r = 1. 
c Plot the lower and upper bounds for a triangular source with tr= 1. 

3*29 A stationary random process has an autocorrelation function given by R v — 
\A~e ,rl cos 2rr£,r and it is known that the random process never exceeds 6 in 
magnitude. Assuming A =6, how many quantization levels are required to 
guarantee a signal-to-quantization noise ratio of at least 60 dB? 

3-30 An additive white gaussian noise channel has the output V = X + G, where X is 
the channel input and G is the noise with probability density function 


P(n) 



-ft '/2«r 


If A" is a white gaussian input with E(X) = 0 and E{X 2 ) - o-;, determine 
a the conditional differential entropy H{X | G ): 
b the average mutual information /( X: Y). 

3-31 A DMS has an alphabet of eight letters. x„ / = l . 2 8. with probabilities 
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given in Problem 3-7. Use the Huffman encoding procedure to determine a ternary 
code (using symbols 0, 1, and 2) for encoding the source output. 

[Hint: Add a symbol x 9 with probability p g = 0, and group three symbols at a 
time.] 

3-32 Determine whether there exists a binary code with code word lengths 
(«i, n 2 , n y , n 4 ) = (1, 2, 2, 3) that satisfy the prefix condition. 

3-33 Consider a binary block code with 2" code words of the same length n. Show that 
the Kraft inequality is satisfied for such a code. 

3-34 Show that the entropy of an « -dimensional gaussian vector X = [x, x 2 ■ ■ ■ x n ] 
with zero mean and covanance matrix M is 

tf(X)=Uog 2 (2«r \M\ 

3-35 Consider a DMS with output bits (0,1) that are equiprobable. Define the 
distortion measure as D = P t , where P e is the probability of error in transmitting 
the binary symbols to the user over a BSC. Then the rate distortion function is 
(Berger, 1971) 

R(D) = 1 + D log, D + (1 - D) log, (1 -D), 0 D = P, 

Plot R(D) for 0 *£ D *£ 5. 

3-36 Evaluate the rate distortion function for an M - ary symmetric channel where 
D = P M and 

R(D) = log, M + D log, D + (1 - D) log, 




for M — 2, 4, 8, and 16. P M is the probability of error. 

3-37 Consider the use of the weighted mean-square-error (MSE) distortion measure 
defined as 

<UX, X) = (X-X)'W(X-X) 


where W is a symmetric, positive-definite wieghting matrix. By factorizing W .as 
W = P'P, show that ^(X, X) is equivalent to an unweighted MSE distortion 
measure d 2 (X',X') involving transformed vectors X' and X'. 

3-38 Consider a stationary stochastic signal sequence {*(«)} with zero mean and 
autocorrelation sequence 


4>(n) = 


(n — 0) 

(« = ±1) 
(otherwise) 


a Determine the prediction coefficient of the first-order minimum MSE predictor 
for {X(/i)} given by 


x(n) =a,x(n - 1) 


and the corresponding minimum mean square error 
b Repeat (a) for the second-order predictor 


£(n) -a t x(n - l) + o,x(n - 2) 
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FIGURE P3-39 
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3-39 Consider the encoding of the random variables x , and or, that are characterized by 
the joint pdf p(x u x z ) given by 

15/lab (at, , x 2 e C) 

p(x, , JC-,) = 

(.0 (otherwise) 

as shown in Fig. P3-39. Evaluate the bit rates required for uniform quantization of 
x ( and x 2 separately (scalar quantization) and combined (vector) quantization of 
(jc i , *;). Determine the difference in bit rate when a = 4 b. 

3-40 Consider the encoding of two random variables X and Y that are uniformly 
distributed on the region between two squares as shown in Fig. P3-40. 
a Find f x (x) and .My). 

b Assume that each of the random variables X and V are quantized using four 
level uniform quantizers. What is the resulting distortion? What is the resulting 
number of bits per (X, Y ) pair? 


y 


-2 


i 


-i i 


-i 


FIGURE P3-40 
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FIGURE P3-41 
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c Now assume that instead of scalar quantizers for X and Y, we employ a vector 
quantizer to achieve the same level of distortion as in (b). What is the resulting 
number of bits per source output pair (X, Y)1 
3*41 Two random variables X and Y are uniformly distributed on the square shown in 

Fig. P3-41 . 

a Find f x (x) and f y (y). 

b Assume that each of the random variables X and Y are quantized using four 
level uniform quantizers. What is the resulting distortion? What is the resulting 
number of bits per (X, Y) pair? 

c Now assume that, instead of scalar quantizers for X and Y , we employ a vector 
quantizer with the same number of bits per source output pair ( X , Y) as in (b). 
What is the resulting distortion for this vector quantizer? 



CHARACTERIZATION OF 
COMMUNICATION SIGNALS 
AND SYSTEMS 


Signals can be categorized in a number of different ways, such as random 
versus deterministic, discrete time versus continuous time, discrete amplitude 
versus continuous amplitude, lowpass versus bandpass, finite energy versus 
infinite energy, finite average power versus infinite average power, etc. In this 
chapter, we treat the characterization of signals and systems that are usually 
encountered in the transmission of digital information over a communication 
channel. In particular, we introduce the representation of various forms of 
digitally modulated signals and describe their spectral characteristics. 

We begin with the characterization of bandpass signals and systems, 
including the mathematical representation of bandpass stationary stochastic 
processes. Then, we present a vector space representation of signals. We 
conclude with the representation of digitally modulated signals and their 
spectral characteristics. 


4-1 REPRESENTATION OF BANDPASS SIGNALS 
AND SYSTEMS 

Many digital information-bearing signals are transmitted by some type of 
carrier modulation. The channel over which the signal is transmitted is limited 
in bandwidth to an interval of frequencies centered about the carrier, as in 
double-sideband modulation, or adjacent to the carrier, as in single-sideband 
modulation. Signals and channels (systems) that satisfy the condition that their 
bandwidth is much smaller than the carrier frequency are termed narrowband 
bandpass signals and channels (systems). The modulation performed at the 
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IStfll 


FIGURE 4-1-1 Spectrum of a bandpass signal. 



transmitting end of the communication system to gdherate the bandpass signal 
and the demodulation performed at the receiving end to recover the digital 
information involve frequency translations. With no loss of generality and for 
mathematical convenience, it is desirable to reduce all bandpass signals and 
channels to equivalent lowpass signals and channels. As a consequence, the 
results of the performance of the various modulation and demodulation 
techniques presented in the subsequent chapters are independent of carrier 
frequencies and channel frequency bands. The representation of bandpass 
signals and systems in terms of equivalent lowpass waveforms and the 
characterization of bandpass stationary stochastic processes are the main topics 
of this section. 


4*1-1* Representation of Bandpass Signals 

Suppose that a real-valued signal s(t) has a frequency content concentrated in 
a narrow band of frequencies in the vicinity of a frequency f c , as shown in Fig. 
4-1-1. Our objective is to develop a mathematical representation of such 
signals. First, we construct a signal that contains only the positive frequencies 
in s(t). Such a signal may be expressed as 

S+(n = 2u(f)S(f) (4-1-1) 

where S(f) is the Fourier transform of s(t) and u(f ) is the unit step function. 
The equivalent time-domain expression for (4-1-1) is 

MO-f S+U)e> 2 *f>df 

J — x 

= F-'[2u(f)] + F-'[S{f)] (4-1-2) 

The signal s + (t) is called the analytic signal or the pre -envelope of s(t). We 
note that F*'[5(/)] = i(r) and 


F-'[2u(f)) = S(t) + ^ 

jtt 


(4-1-3) 
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Hence, 


We define s(t) as 


r40 = U/) + £]★•*(/) 

L Kti 


= s{t) + j—* s(t) 
nt 


s{t) = —+s(t) 
nt 


(4-1-4) 


= i r £i£) 

n )~ K t- r 


dr 


(4-1-5) 


The signal s(f) may be viewed as the output of the filter with impulse response 


ixt 


(4-1-6) 


when excited by the input signal s(t). Such a filler is called a Hilbert 
transformer. The frequency response of this filter is simply 


mn = \ 


me 


-a«f> 


dt 


-if T 

n t 


dt 


(4-1-7) 


~j (/> 0) 

0 (/ = 0) 

J (/<0) 

We observe that ;//(/■)) = l and that the phase response ©(/) = -\n for / >0 
and 0(/) = \n for / <0. Therefore, this filter is basically a 90° phase shifter for 
all frequencies in the input signal. 

The analytic signal s+(t) is a bandpass signal. We may obtain an equivalent 
lowpass representation by performing a frequency translation of S+{J). Thus, 
we define S,(f) as 

S,(f) = SM+fc) (4-1-8) 

The equivalent time-domain relation is 

s ,it)=s + {t)e~ i2Kf,t 

= [r(r) + jS(t))e (4-1-9) 

or, equivalently, 

s(r) + j$(t) = s,( ty 2 ** 1 (4-1-10) 

In general, the signal s ; (f) is complex- valued (see Problem 4-5), and may be 
expressed as 


*/(') = * (0+/y(0 


(4-1-11) 
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[f we substitute for s,(t) in (4-1-11) and equate real and imaginary purls on 
each side, we obtain the relations 

s(t) = x(t) cos 2 nf-t - y{t) sin litf t (4-1-12) 

s(0 = *(/) sin 2 rtf c t + y(f) cos 2nf c t (4-1 -13) 

The expression (4-1-12) is the desired form for the representation of a 
bandpass signal. The low-frequency signal components x(t) and y(r) may be 
viewed as amplitude modulations impressed on the carrier componenls 
coslnft and %mlJtf c t, respectively. Since these carrier components are in 
phase quadrature, x(t) and y(t) are called the quadrature components of the 
bandpass signal s(f). 

Another representation of the signal in (4-1-12) is 

s(t) = Re{[x(r) + ;>(?)]e^'} 

= Re [s,(t)e' 2 * f ' J \ (4-1-14) 

where Re denotes the real part of the complex-valued quantity in the brackets 
following. The iowpass signal s,(t) is usually called the complex envelope of the 
real signal j(f), and is basically the equivalent Iowpass signal. 

Finally, a third possible representation of a bandpass signal is obtained by 


expressing s t (t) as 



where 

s / (r) = fl(r)e'«' ) 

(4-1-15) 


n(r)= Vjc s (/) +y 2 (t) 

(4-1-16) 

Then 

0(0 = tan- 1 ^ 
*(0 

(4-1-17) 


s(t) = Re {s / (r)e' 2 ' r/< '] 



= Re [a(t)e i[2Kf ^ 6 ^] 

~ a(t) cos [2itf L t + 0(0] 

(4-1-18) 


The signal a{t ) is called the envelope of s(t), and 0{t) is called the phase of j(f). 
Therefore, (4-1-12), (4-1-14), and (4-1-18) are equivalent representations of 
bandpass signals. 

The Fourier transform of $(r) is 



= f (Re [si[t)e )2nf ' , ]}e~ ,2Kf ' dt (4-1-19) 

J — x 


Use of the identity 


Re (£) = *(£ + £*) 


(4-1-20) 
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in (4-1-19) yields the result 

S(f) = ~ f [s,(tW 2 ^ + sf(t)e d( 

A. J -zc 

= hm-f) + Sf{-f-f c ) ] (4-1-21) 

where S r (f) is the Fourier transform of $,(/)• This is the basic relationship 
between the spectrum of the real bandpass signal S(f) and the spectrum of the 
equivalent lowpass signal S,(f). 

The energy in the signal s(t ) is defined as 



= f {Re [s f (t)e J2n ^]} 2 dt (4-1-22) 

J — o t 

When the identity in (4-1-20) is used in (4-1-22), we obtain the following result: 

i r 

+ 2 j MOI 2 cos IWc* + 20(f)) dt (4-1-23) 

Consider the second integral in (4-1-23). Since the signal s(t ) is narrowband, 
the real envelope a(r)=js,(r)| or, equivalently, a 2 (t) varies slowly relative to 
the rapid variations exhibited by the cosine function. A graphical illustration of 
the integrand in the second integral of (4-1-23) is shown in Fig. 4-1-2. The 
value of the integral is just the net area under the cosine function modulated 
by a 2 {t). Since the modulating waveform a\t) varies slowly relative to the 
cosine function, the net area contributed by the second integral is very small 
relative to the value of the first integral in (4-1-23) and, hence, it can be 


FIGURE 4-1-? The signal a 2 (») cos [4*f t r + 20(f)]. 
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neglected. Thus, for all practical pruposes, the energy in the bandpass signal 
s(t), expressed in terms of the equivalent lowpass signal s,(t), is 

* = JtiiOfdt (4-1-24) 

where |s,(r)| is just the envelope alt) of s(f). 


4-1-2 Representation of Linear Bandpass Systems 

A linear filter or system may be described either by its impulse response h(t) 
or by its frequency response //(/), which is the Fourier transform of h(t). 
Since h(t) is real, 


//*(-/) = //(/) 


Let us define H,(f - f) as 


Then 



H{f) (/> 0) 
0 (/< 0 ) 


HT(-f-f') 


[0 

t. 


(/> 0 ) 
(/< 0 ) 


Using (4-1-25), we have 




(4-1-25) 

(4-1-26) 

(4-1-27) 

(4-1-28) 


which resembles (4-1-21) except for the factor 2 . The inverse transform of 
H(f) in (4-1-28) yields h(t) in the form 


h{t) = h'ity 2 ”*' + hfiOe-W'' 

= 2 Re Mt)^] (4-1-29) 

where h t (t) is the inverse Fourier transform of //,(/). In general, the impulse 
response h,(t) of the equivalent lowpass system is complex-valued. 


4-1-3 Response of a Bandpass System to a Bandpass Signal 

In Sections 4-1-1 and 4-1-2, we have shown that narrowband bandpass signals 
and systems can be represented by equivalent lowpass signals and systems. In 
this section, we demonstrate that the output of a bandpass system to a 
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bandpass input signal is simply obtained from the equivalent lowpass input 
signal and the equivalent lowpass impulse response of the system. 

Suppose that s(r) is a narrowband bandpass signal and s f (/) is the equivalent 
lowpass signal. This signal excites a narrowband bandpass system characterized 
by its bandpass impulse response h(t) or by its equivalent lowpass impulse 
response h,(t). The output of the bandpass system is also a bandpass signal, 
and, therefore, it can ' e expressed in the form 

r(l) = Re [ri{t)e i2Kf ' 1 ] (4-1-30) 

where r(f) is related to the input signal s(t ) and the impulse response h(t) by 
the convolution integral 


r(t) = f s(t)/i(; - t) dT (4-1-31) 

* — x 

Equivalently, the output of the system, expressed in the frequency domain, is 

*(/) = S(f)H(f) (4-1-32) 

Substituting from (4-1-21) for S(f) and from (4-1-28) for H(f), we obtain the 
result 


*(/) = atW ~fc) + S?{ -f ~f c )HH,(f -/,) + Hf(-f -/)] (4-1-33) 

When s(t) is a narrowband signal and h{t) is the impulse response of a 
narrowband system, S,(f ~f c ) « 0 and H,(f - / f ) = 0 for / < 0. It follows from 
this narrowband condition that 


W - fcWH-f - f c ) = 0. Sf(-f - ~f c ) = 0 


Therefore, (4-1-33) simplifies to 

R (f ) = iiw - fc)w -f c ) + sn-f -fc)m(-f - f c )] 

= aw - ^ + Rt(-f -m ( 4 - 1 - 34 ) 


where 


W) = S l (f)H l (f) (4-1-35) 

is the output spectrum of the equivalent lowpass system excited by the 
equivalent lowpass signal. It is clear that the time domain relation for the 
output r,(r) is given by the convolution of s,(r) with h,(t). That is, 

r /(t) = f S,(T)h,(l-T)dT 


( 4 - 1 - 36 ) 
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The combination of (4-1-36) with (4-1-30) gives the relationship between 
the bandpass output signal r(t) and the equivalent lowpass time functions s,(f) 
and h,(i). This simple relationship allows us to ignore any linear frequency 
translations encountered in the modulation of a signal for purposes of 
matching its spectral content to the frequency allocation of a particular 
channel. Thus, for mathematical convenience, we shall deal only with the 
transmission of equivalant lowpass signals through equivalent lowpass 
channels. 


4-1-4 Representation of Bandpass Stationary 
Stochastic Processes 

The representation of bandpass signals presented in Section 4-1-1 applied to 
deterministic signals. In this section, we extend the representation to sample 
functions of a bandpass stationary stochastic process. In particular, we derive 
the important relations between the correlation functions and power spectra of 
the bandpass signal and the correlation functions and power spectra of the 
equivalent lowpass signal. 

Suppose that n{t) is a sample function of a wide-s^nse stationary stochastic 
process with zero mean and power spectral density 4>, ,„(/). The power spectral 
density is assumed to be zero outside of an interval of frequencies centered 
around ±f , where / is termed the carrier frequency. The stochastic process 
n(r) is said to be a narrowband bandpass process if the width of the spectral 
density is much smaller than f . Under this condition, a sample function of the 
process n(t) can be represented by any of the three equivalent forms given in 
Section 4-1-1, namely, 

«(') = a(0 cos [2 jtf e t + 0(r)] (4-1-37) 

= *(0 COS 2 Jif.t - y(t) sin lnf c t (4-1-38) 

= Re [z(ty 2 * f '} (4-1-39) 

where a(t) is the envelope and 0(f) is the phase of the real-valued signal, jr(f ) 
and y(i ) are the quadrature components of n(t), and z(t) is called the complex 
envelope of n(t). 

Let us consider the form given by (4-1-38) in more detail. First, we observe 
that if n(r) is zero mean, then x (/) and y(f) must also have zero mean values. 
In addition, the stationarity of n(t ) implies that the autocorrelation and 
cross-correlation functions of jc(r) and y(f) satisfy the following properties: 


<MO = <b yy (r) 

<M r ) = -<M T ) 


(4-1-40) 

(4-1-41) 
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That these two properties follow from the stationarity of n(t) is now 
demonstrated. The autocorrelation function <£„„( r ) °f «(0 is 

E[n{t)n(t + r)] = £{[jt(r) cos 2nf c t - y(t) sin 2 nf c t] 
x [x(f + r) cos 2 Jtf c {t + r) 

-y(t + t) sin 2 7tf c (l + r)]} 

= </>„( r) cos 2 itf c t cos2/r/ r (r + r) 

+ 4> yy ( t) sin 2 jzf c i sin 2n f c (t + r) 

- <f> xy (z) sin 2 jzf c t cos 2nf c (t + r) 

- <f> yx (r) cos 2 jifj sin 2 + r) (4-1-42) 

Use of the trigonometric identities 

cos A cos B = j [cos (A - B) + cos (4 + £)] 
sin >4 sin S = j[cos {A ~B)-cos(A + fi)] (4-1-43) 

sin A cos B = ^[sin (A - B) + sin (A + B)] 
in (4-1-42) yields the result 

E[rt(f)n(f + t)] = 2[<f> x A r ) + d> vv (T)] cos 2icf c x 

+ 2 [d>rr(r) - d>vy( r )l cos 2«£(2/ + r) 

- £[<M T ) - 4> x ?(t)] sin 2 nf c x 

~ 2 [ ( z) + 4> xy (z)) sin 2nf(2t + r) (4-1-44) 

Since n(t) is stationary, the right-hand side of (4-1-44) must be independent of 
t. But this condition can only be satisfied if (4-1-40) and (4-1-41) hold. As a 
consequence, (4-1-44) reduces to 

4>nn(T) = d>„(f)cos27i/ c T- <£„(r) sin 2/# r (4-1-45) 

We note that the relation between the autocorrelation function </>„„{ r) of the 
bandpass process and the autocorrelation and cross-correlation functions 
<fr„(r) and <f> yx ( r) of the quadrature components is identical in form to 
(4-1-38), which expresses the bandpass process in terms of the quadrature 
components. 

The autocorrelation function of the equivalent lowpass process 

Z(t) = x(t) A jy(t) (4-1-46) 

is defined as 


<£«(*) = i£[z*(r)z(f + r)] 


(4-1-47) 
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Substituting (4-1-46) into (4-1-47) and performing the expectation operation, 
we obtain 

<Mt) = z[^v(t) + <f>yy(z) -;<Mt) +y^.v(T)l (4-1-48) 

Now if the symmetry properties given in (4-1-40) and (4-1-41) are used in 
(4-1-48), we obtain 

<t>M = 4>M+j<t> yx (T) (4-1-49) 

which relates the autocorrelation function of the complex envelope to the 
autocorrelation and cross-correlation functions of the quadrature components. 
Finally, we incorporate the result given by (4-1-49) into (4-1-45), and we have 

<Mt) * Re [^(ry 2 *'] (4-1-50) 

Thus, the autocorrelation function d>„„(T) of the bandpass stochastic process is 
uniquely determined from the autocorrelation function <£„( r) of the equiv- 
alent iowpass process z(t ) and the carrier frequency £. 

The power density spectrum $„„(/) of the stochastic process n(r) is the 
Fourier transform of 4 >„„(t). Hence, 

J - x 

= + ***(-/ -£)] (4-1-51) 

where $>...(/) is the power density spectrum of the equivalent Iowpass process 
z(t). Since the autocorrelation function of z(r) satisfies the property r) = 
d>? c (-f), it follows that 4>„(/) is a real-valued function of frequency. 

Properties of the Quadrature Components It was just demonstrated 
above that the cross-correlation function of the quadrature components x(t) 
and y(t) of the bandpass stationary stochastic process n(t) satisfies the 
symmetry condition in (4-1-41). Furthermore, any cross-correlation function 
satisfies the condition 


<M*) = **><-*) (4-1-52) 

From these two conditions, we conclude that 

<Mr)=-<M-r) (4-1-53) 

That is, d>,,( r) is an odd function of r. Consequently, <£,,(0) = 0, and, hence, 
.c(0 and y(t ) are uncorrelated (for r = 0, only). Of course, this does not mean 
that the processes ,t(r) and >(/ + r) are uncorrelated for all r, since that would 
imply that <Mr) =0 for all r. If, indeed, <£, y (r ) = 0 for all r, then is 

real and the power spectral density <F ;; (/) satisfies the condition 

*«</)=*«(-/) 

and vice versa. That is, <!>„(/) is symmetric about f = 0. 


(4-1-54) 
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FIGURE 4-1-3 


In the special case in which the stationary stochastic process n(r) is gaussian, 
the quadrature components jc(/) and y{t + r) are jointly gaussian. Moreover, 
for t-0, they are statistically independent, and, hence, their joint probability 
density function is 

pU.y) = ~ (4-1-55) 

Z7tcr~ 

where the variance cr 2 is defined as cr 2 = d>„(0) = d> vv (0) = 4>„„(0). 

Representation of White Noise White noise is a stochastic process that is 
defined to have a flat (constant) power spectral density over the entire 
frequency range. This type of noise cannot be expressed in terms of quadrature 
components, as a result of its wideband character. 

In problems concerned with the demodulation of narrowband signals in 
noise, it is mathematically convenient to model the additive noise process as 
white and to represent the noise in terms of quadrature components. This can 
be accomplished by postulating that the signals and noise at the receiving 
terminal have passed through an ideal bandpass filter, having a passband that 
includes the spectrum of the signals but is much wider. Such a filter will 
introduce negligible, if any, distortion on the signal but it does eliminate the 
noise frequency components outside of the passband. 

The noise resulting from passing the white noise process through a 
spectrally flat (ideal) bandpass filter is termed bandpass white noise and has the 
power spectral density depicted in Fig. 4-1-3. Bandpass white noise can be 
represented by any of the forms given in (4-1-37), (4-1-38), and (4-1-39). The 
equivalent lowpass noise z(t) has a power spectral density 


• ' lO (l/l>|0) 

(4-1-56) 

and its autocorrelation function is 


sin/r5r 


Q- 

e* 

ii 

=* 

(4-1-57) 

The limiting form of as B approaches infinity is 


<Mr) = N ( 1 S(r) 

(4-1-58) 
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Bandpass noise with a flat spectrum. ~f c / c 
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The power spectral density for white noise and bandpass white noise is 
symmetric about f = 0, so <f\,(r) = 0 for all r Therefore. 

<Mt) = <Mt) = *, v (*) (4-1-59) 

That is, the quadrature components Jt(r) and y(t) are uncorrelated for all time 
shifts rand the autocorrelation functions of z(t), x(t), and y(t) are all equal. 

4-2 SIGNAL SPACE REPRESENTATIONS 

In this section, we demonstrate that signals have characteristics that are similar 
to vectors and develop a vector representation for signal waveforms. We begin 
with some basic definitions and concepts involving vectors. 

4-2-1 Vector Space Concepts 

A vector v in an /t-dimensional space is characterized by its n components 
[ip v 2 . . u„J. It may also be represented as a linear combination of unit 
vectors or basis vectors e,, 1 *£/ «£«, i.e., 

// 

*=2*»<e.- (4-2-1) 

/=*-] 

where, by definition, a unit vector has length unity and v, is the projection of 
the vector v onto the unit vector e,. 

The inner product of two n -dimensional vectors v, = v, 2 ■ ■ ■ v u ] and 

v 2 = [^21 u 22 ... v 2 „] is defined as 

*i -v z = 'Z v u v 2i (4-2-2) 

J --- l 

Two vectors v, and v 2 are orthogonal if v 1 • v 2 = 0, More generally, a set of m 
vectors v k , 1 ^ k ^ m, are orthogonal if 

v, • \ = 0 (4-2-3) 

for all 1 /', j ss m and i ¥=■ j. 

The norm of a vector v is denoted by ||v|| and is defined as 

l|v|{ =(vv) w =,feu; (4-2-4) 

\ / - l 

which is simply its length. A set of m vectors is said to be orthonormal if the 
vectors are orthogonal and each vector has a unit norm. A set of m vectors is 
said to be linearly independent if no one vector can be represented as a linear 
combination of the remaining vectors. 

Two n-dimensional vectors v, and v 2 satisfy the triangle inequality 

II V, + V 2 || ss ilv, II + II v 2 || (4-2-5) 

with equality if v, and v 2 are in the same direction, i.e., v s =av 2 where a is a 
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positive real scalar. From the triangle inequality there follows the Cauchy- 
Schwartz inequality 

||V 2 || (4-2-6) 

with equality if v, = av 2 . The norm square of the sum of two vectors may be 
expressed as 

II V, + V 2 || 2 = II V, II 2 + ||v 2 |i 2 + 2?, • v 2 (4-2-7) 

If v, and v 2 are orthogonal then Vj • v 2 = 0 and, hence, 

l)v,+v 2 f =||v ,|| 2 + ||v 2 || 2 (4-2-8) 

This is the Pythagorean relation for two orthogonal n-dimensional vectors. 

From matrix algebra, we recall that a linear transformation in an n- 
dimensional vector space is a matrix transformation of the form 

v' = Av (4-2*?) 

where the matrix A transforms the vector v into some vector v'. In the special 
case where v' = Av, i.e., 

Av = Av (4-2-10) 

where A is some (positive or negative) scalar, the vector v is called an 
eigenvector of the transformation and A is the corresponding eigenvalue. 

Finally, let us review the Gram-Schmidt procedure for constructing a set of 
orthonormal vectors from a set of n -dimensional vectors v„ 1* J/ssm. We 
begin by arbitrarily selecting a vector from the set, say v,. By normalizing its 
length, we obtain the first vector, say 


»i = 



(4-2-11) 


Next, we may select v 2 and, first, subtract the projection of v 2 onto ii]. Thus, we 
obtain 

u' = v 2 - (v 2 • u,)u, (4-2-12) 

Then, we normalize the vector u 2 to unit length. This yields 


■2 = 



(4-2-13) 


The procedure continues by selecting v 3 and subtracting the projections of 
v 3 into U| and u 2 . Thus, we have 


»3 = v 3 — (v 3 • u, )u, - (v 3 - u 2 )u 2 (4-2-14) 

Then, the orthonormal vector u 3 is 


IlUall 


(4-2-15) 



CHAPTER 4 CHARACTERIZATION OF COMMUNICATION SIGNALS AND SYSTEMS 165 


By continuing this procedure, we shall construct a set of n lt orthonormal 
vectors, where n,^n, in general. If m<n then n^m, and if m^n then 

rt, « Al. 


4-2-2 Signal Space Concepts 

As in the case of vectors, we may develop a parallel treatment for a set of 
signals defined on some interval [a, b). The inner product of two generally 
complex-valued signals j t,(f) and jt 2 (r) is denoted by <X|(r), x 2 (t)) and defined 
as 


<Jf|(f>, jc 2 (0) 


= f x , 


(0*?(0 dt 


The signals are orthogonal if their inner product is zero. 
The norm of a signal is defined as 


IWOII 


-(!>«)" 


(4-2-16) 


( 4 - 2 - 17 ) 


A set of m signals are orthonormal if they are orthogonal and their norms are 
all unity. A set of m signals is linearly independent, if no signal can be 
represented as a linear combination of the remaining signals. 

The triangle inequality for two signals is simply 


||*,(/) + x 2 (/)t| |jc,(r)|| + IMOH 

and the Cauchy -Schwartz inequality is 


(4-2-18) 


rb rb 1/2 rb 1/2 

x t (t)x$(t)dl ^ \x x {t)\ 2 dt \x 2 {l)\ 2 dt (4-2-19) 

Ja Jq t 


with equality when x 2 (0 - njf,(t), where a is any complex number. 


4-2-3 Orthogonal Expansions of Signals 

In this section, we develop a vector representation for signal waveforms, and, 
thus, we demonstrate an equivalence between a signal waveform and its vector 
representation. 

Suppose that j(r) is a deterministic, real-valued signal with finite energy 

% = f [s(t)] 2 dt (4-2-20) 

Furthermore, suppose that there exists a set of functions n = 

1,2 , . , N} that are orthonormal in the sense that 




(m r^n) 
(***«) 


(4-2-21) 
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We may approximate the signal s(t) by a weighted linear combination of 
these functions, i.e., 

HO = £ s k f k (t) (4-2-22) 

k = 1 


where {s*, 1 k ^ K} are the coefficients in the approximation of s(t). The 
approximation error incurred is 

e(/) = s(t)-f(t) (4-2-23) 

Let us select the coefficients {s*} so as to minimize the energy % of the 
approximation error. Thus, 

% r =j js{t)-m] 2 dt 

= f Uo-E **/*(0l * (4-2-24) 

J-X L * = i J 


The optimum coefficients in the series expansion of s(r) may be found by 
differentiating (4-2-24) with respect to each of the coefficients {s*} and setting 
the first derivatives to zero. Alternatively, we may use a well-known result 
# from estimation theory based on the mean-square-error criterion, which, 
simply stated, is that the minimum of % with respect to the { 5 *} is obtained 
when the error is orthogonal to each of the functions in the series expansion. 
Thus, 

j [*(0-S**/*(0]a( 0<*'0, n = l,2,...,K (4-2-25) 

Since the functions {/„(r)} are orthonormal, (4-2-25) reduces to 

= J s(t)f„(t) dt, n = 1, 2, . . . , K (4-2-26) 

•'-x 


Thus, the coefficients are obtained by projecting the signal s(r) onto each of the 
functions {/„(/)}. Consequently, i(r) is the projection of s(r) onto the 
/(-dimensional signal space spanned by the functions {/„(/)} The minimum 
mean square approximation error is 

= f e(t)s(t) dt 

J — ac 

[r(0} 2 dt - I £ s kfk(t)s(t) dt 

J-x Jt-1 
*-1 

which is nonnegative, by definition. 



(4-2-27) 
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When the minimum mean square approximation error £ min = 0, 

k - 1 

Under the condition that % min - 0, we may express s(t) as 

K 

s(0 = S «*/*(') (4-2-29) 

k I 

where it is understood that equality of s(r) to its series expansion holds in the 
sense that the approximation error has zero energy. 

When every finite energy signal can be represented by a series expansion of 
the form in (4-2-29) for which £ min = 0, the set of orthonormal functions {/„(()} 
is said to be complete. 


j MO]’ dt 


(4-2-28) 


Example 4-2-1: Trigonometric Fourier Series 

A finite energy signal s{t ) that is zero everywhere except in the range 
T and has a finite number of discontinuities in this interval, can be 
represented in a Fourier series as 



, , v / 2;r/d ■ 2; rkt\ 

fl * c os A Sin 

A- 1) X / l' 

(4-2-30) 

where the coefficients {a k , b k } that minimize the mean square error are given 
by 


1 f , s 2nkt J 

“‘'Vti !(,)COS T * 

. If". 2 nkt 

h ‘vfl S(l>Sm T d ' 

(4-2-31) 


The set of trigonometric functions {V2/T cos2nkl!T, \f2jfsin2xkt/T} is 
complete, and, hence, the series expansion results in zero mean square 
error. These properties are easily established from the development given 
above. 


Gram-Schmidt Procedure Now suppose that we have a set of finite 

energy signal waveforms {a,( 0. i = 1,2 M) and we wish to construct a set 

of orthonormal waveforms. The Gram-Schmidt orthogonalization procedure 
allows us to construct such a set. We begin with the first waveform j,(r), which 
is assumed to have energy % x . The first waveform is simply constructed as 



Thus, /,(/) is .simply j,(r) normalized to unit energy. 


(4-2-32) 
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The second waveform is constructed from s 2 (t ) by first computing the 
projection of/,(f) onto s 2 (t), which is 

<02 =f *2(0/1 (0 dt (4-2-33) 

— DC 

Then, c !2 /,(r) is subtracted from s 2 (0 to yield 

/i(0 - *2(0 _ Ci 2 /i(0 (4-2-34) 

This waveform is orthogonal to f,(t) but it does not have unit energy. If 
% denotes the energy of the normalized waveform that is orthogonal to 
/t(0 >s 


/i«>-()= (4-2-35) 

In general, the orthogonalization of the Ath function leads to 

« 0-^ (4-2-36) 

where 

f' k (t) = s k (t)~ S c, k f{t) (4-2-37) 

X *= ] 

and 

c,k = [ **(0/(0 1 = 1. 2, — 1 (4-2-38) 

— <Xr 

Thus, the orthogonalization process is continued until all the M signal 
waveforms {*,(/)} have been exhausted and N^M orthonormal waveforms 
have been constructed. The dimensionality N of the signal space will be equal 
to M if all the signal waveforms are linearly independent, i.e., none of the 
signals waveforms is a linear combination of the other signal waveforms. 


Example 4-2-2 

Let us apply the Gram-Schmidt procedure to the set of four waveforms 
illustrated in Fig. 4-2-l(a). The waveform j,(/) has energy g, = 2, so that 
/i(0 = v 2*i(0- Next, we observe that ^ = 0; hence, s 2 (/) and /,(/) are 
orthogonal. Therefore, /(/) s 2 (t). To obtain /,(?). we 

compute c n and c 23 , which are c„ = V2 and c 21 = 0. Thus, 

/((0 = * 3 ( 0 -^/(/)-{~ ( 1 . 

l 0 (otherwise) 
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Gram-Schmidt orthogonalization of the signals {.s,(r), i ~ 1. 2. 3, 4} and the corresponding 
orthogonal signals. 


Since f \(t) has unit energy, it follows that / 3 (f) = /j(f)- In determining / 4 (r), 
we find that c I4 = - V2. c 24 = 0, and c J4 = I. Hence, 

ut)=s 4 (t) + V2M,)-m=o 

Consequently, s 4 (f) is a linear combination of /,(r) and / 3 (r) and, hence, 
/ 4 (f) = 0. The three orthonormal functions are illustrated in Fig. 4-2-1 (b). 
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FIGURE 4-2-2 


Once we have constructed the set of orthonormal waveforms {£,(/)}, we can 
express the M signals {j„(/}} as linear combinations of the {/,(/)}• Thus, we may 
write 

V 

* = 1.2 M (4-2-39) 

k l 

and 

** = f MO ] 2 di = E .si, - || Si || 2 (4-2-40) 

*. ,, i 

Based on the expression in (4-2-39), each signal may be represented by the 
vector 


s* = M -Va2 ■ ■ 'ay] (4-2-41) 

or, equivalently, as a point in the /V-dimensional signal space with coordinates 

M, f = 1.2 /V}. The energy in the Arth signal is simply the square of the 

length of the vector or, equivalently, the square of the Euclidean distance from 
the origin to the point in the /V-dimensional space. Thus, any signal can be 
represented geometrically as a point in the signal space spanned by the 
orthonormal functions {/,(r)}. 


Example 4-2-3 

Let us obtain the vector representation of the four signals shown tn Fig. 
4-2- 1 (a) by using the orthonormal set of functions in Fig. 4-2- 1 (/> ). Since the 
dimensionality of the signal space is ,V = 3, each signal is described bv three 
components. The signal s,(t) is characterized by the vector s, =(V2,0, 0). 
Similarly, the signals s 2 ((). s 2 (t), and ,v 4 (/) are characterized by the vectors 
*2 = (0, V2,0). s, = (V2.<), 1 ). and s 4 = ( - V2 . 0, 1 ), respectively. These 
vectors are shown in Fig. 4-2-2. Their lengths are |s,| = V2 , |s : | = \ 2 . 


The four signal vectors represented as points in 
three dimensional function space. 


f. 
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FIGURE 4-2 3 


|s 3 | = V3 , and |s 4 | = v'S , and the corresponding signal energies are = fs*| 2 , 
* = 1,2, 3,4. 

We have demonstrated that a set of M finite energy waveforms {s„(f)} can 
be represented by a weighted linear combination of orthonormal functions 
{/„(r)} of dimensionality N^M. The functions {/„(()} are obtained by applying 
the Gram-Schmidt orthogonalization procedure on {s„(f)}- It should be 
emphasized, however, that the functions {/„(/)} obtained from the Gram- 
Schmidt procedure are not unique. If we alter the order in which the 
orthogonalization of the signals {$„(/)} is performed, the orthonormal wave- 
forms will be different and the corresponding vector representation of the 
signals {5„(/)} will depend on the choice of the orthonormal functions {/„(f}}. 
Nevertheless, the vectors {s„} will retain their geometrical configuration and 
their lengths will be invariant to the choice of orthonormal functions 


Example 4-2-4 

An alternative set of orthonormal functions for the four signals in Fig. 4-2-1 
is illustrated in Fig. 4-2-3(a). By using these functions to expand. {s„(f)}, we 


An alternative set of orthonormal functions for the four signals in Fig. 4-2-l(a) and the 
corresponding signal points. 


Sid> 
) 

0 


Si«) 




I 1 


1 



► I 0 


_] 

© 

1 



In) 


*2 
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obtain the corresponding vectors s, = (1. 1,0), s 2 = (1, -1,0), s 3 = {1, 1,-1), 
and s 4 — ( — 1. — 1, — 1), which are shown in Fig. 4-2 -3(b). Note that the 
vector lengths are identical to those obtained from the orthonormal 
functions {/„(/)}. 

The orthogonal expansions described above were developed for real-va’ued 
signal waveforms. The extension to complex-valued signal waveforms is left as 
an exercise for the reader (see Problems 4-6 and 4-7). 

Finally, let us consider the case in which the signal waveforms are bandpass 
and represented as 


s,„(f) = Re m- 1,2 M 


(4-2-42) 


where (s /<M (/)} denote the equivalent lowpass signals. Recall that the signal 
energies may be expressed either in terms of s,„(r) or s,,„(r), as 


£ 


= f \ 2 

tn J 3 tti 

J v. 


(t)d, 


f l*/„.(r)l 2 di 

j - X 


(4-2-43) 


The similarity between any pair of signal waveforms, say s„,(r) and s k (t), is 
measured by the normalized cross-correlation 

j s„,{t)s k (t)dt = Re{^~== j i*„(f)sfc(0</f} (4-2-44) 


V <f. 


We define the complex-valued cross-correlation coefficient p km as 

1 


Then. 


Pk„ 


j r , 




(4-2-45) 


Re (p kw ) = 


Vu 


r. 


s,„{t)s k (t) dt 


(4-2-46) 


or. equivalently. 


Re (p k ,„) 


S/ix * 5* 




lls„.ll IM 


(4-2-47) 


The cross-correlation coefficients between pairs of signal waveforms or 
signal vectors comprise one set of parameters that characterize the similaritv 
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of a set of signals. Another related parameter is the Euclidean distance d IV, 
between a pair of signals, defined as 

di e i= «*„-**« 

= {J M0-J*(0] z <*} 

= {8U + % - 2 V%^f k Re (p km )y a (4-2-48) 


When %„ = = % for all m and k, this expression simplifies to 

4^, = {2£[1 — Re(p*,, l )]} 1 ' 2 (4-2-49) 

Thus, the Euclidean distance is an alternative measure of the similarity (or 
dissimilarity) of the set of signal waveforms or the corresponding signal 
vectors. 

In the following section, we describe digitally modulated signals and make 
use of the signal space representation for such signals. We shall observe that 
digitally modulated signals, which are classified as linear, are conveniently 
expanded in terms of two orthonormal basis functions of the form 


M0~ 


— cos 2x f t 


MO = - 


;sin 2 nft 


(4-2-50) 


Hence, if s lm (t) is expressed as s lm (t) = x,(t) +;>,(/), it follows that s„,(r) in 
(4-2-42) may be expressed as 


s,„(r) = .r,(0/i(0 + yAOfiiO (4-2-51 ) 

where x,(t) and y,(r) represent the signal modulations. 


4-3 REPRESENTATION OF DIGITALLY 
MODULATED SIGNALS 

In the transmission of digital information over a communications channel, the 
modulator is the interface device that maps the digital information into analog 
waveforms that match the characteristics of the channel. The mapping is 
generally performed by taking blocks of k = log 2 M binary digits at a time from 
the information sequence {a,,} and selecting one of M = 2* deterministic finite 
energy waveforms m = 1. 2 M) for transmission over the channel. 

When the mapping from the digital sequence {a,,} to waveforms is 
performed under the constraint that a waveform transmitted in any time 
interval depends on one or more previously transmitted waveforms, the 
modulator is said to have memory. On the other hand, when the mapping 
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from the sequence {a„} to the waveforms {$„,(/)} is performed without any 
constraint on previously transmitted waveforms, the modulator is called 
memoryless. 

In addition to classifying the modulator as either memoryless or having 
memory, we may classify it as either linear or nonlinear. Linearity of a 
modulation method requires that the principle of superposition applies in the 
mapping of the digital sequence into successive waveforms. In nonlinear 
modulation, the superposition principle does not apply to signals transmitted in 
successive time intervals. We shall begin by describing memoryless modulation 
methods. 

4-3-1 Memoryless Modulation Methods 

As indicated above, the modulator in a digital communication system maps a 
sequence of binary digits into a set of corresponding signal waveforms. These 
waveforms may differ in either amplitude or in phase or in frequency, or some 
combination of two or more signal parameters. We consider each of these 
signal types separately, beginning with digital pulse amplitude modulation 
(PAM). In all cases, we assume that the sequence of binary digits at the input 
to the modulator occurs at a rate of R bits/s. 

Pulse Amplitude Modulated (PAM) Signals In digital PAM, the signal 
waveforms may be represented as 

UO = Re {A m g{ty 2 «'\ 

= A m g(r)cos2n/ c r, m = 1, 2, . . . , M, 0 =sr=sT (4-3-1) 

where {A m , Ums M} denote the set of M possible amplitudes corresponding 
to M = 2* possible A-bit blocks or symbols. The signal amplitudes A m take the 
discrete values (levels) 

Am = (2m - 1 - M)d, m — (4-3-2) 

where 2d is the distance between adjacent signal amplitudes. The waveform 
g(t) is a real-valued signal pulse whose shape influences the spectrum of the 
transmitted signal, as we shall observe later. The symbol rate for the PAM 
signal is R/k. This is the rate at which changes occur in the amplitude of the 
carrier to reflect the transmission of new information. The time interval 
T b = 1 /R is called the bit interval and the time interval T = k/R = kT b is called 
the symbol interval. 

The M PAM signals have energies 

$n= f S 2 m (t)dt 

J 0 

= Mi [ g 2 (t) dt 

= Mi% 


(4-3-3) 



FIGURE 4-3-1 
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Signal space diagram for digital PAM signals. 


I< )Af = R 


where denotes the energy in the pulse g(t). Clearly, these signals are 

one-dimensional ( N = 1), and, hence, are represented by the general form 

y> 

II 

(4-3-4) 

where f(t) is defined as the unit-energy signal waveform given as 

i" 


/(0 = yj ^ g(t) cos 2 nf t 

(4-3-5) 

and 


s„=A m V^ g , m ~ \,2 M 

(4-3-6) 


The corresponding signal space diagrams for M - 2, M = 4 and M = 8 are 
shown in Fig, 4-3-1. Digital PAM is also called amplitude -shift keying (ASK). 

-The mapping or assignment of k information bits to the M-2 k possible 
signal amplitudes may be done in a number of ways. The preferred assignment 
is one in which the adjacent signals amplitudes differ by one binary digit as 
illustrated in Fig. 4-3-1. This mapping is called Gray encoding. It is important 
in the demodulation of the signal because the most likely errors caused by 
noise involve the erroneous selection of an adjacent amplitude ter the 
transmitted signal amplitude. In such a case, only a single bit error occurs in 
the k bit sequence. 

We note that the Euclidean distance between any pair of signal points is 

d l ,m, = ^'(s„ - s„) 2 

= i a„, - a„ i 

= d\Z2% s \m -n| (4-3-7) 

Hence, the distance between a pair of adjacent signal points, i.e., the minimum 
Euclidean distance, is 




(4-3-8) 
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FIGURE 4-3-2 


The carrier-modulated PAM signal represented by (4-3-1) is a double- 
sideband (DSB) signal and requires twice the channel bandwidth of the 
equivalent lowpass signal for transmission. Alternatively, we may use single- 
sideband (SSB) PAM, which has the representation (lower or upper sideband). 

s m (r) = Re {A n [g(t)±jg(t)]e i2 * fJ }, m = 1, 2, .... M (4-3-9) 

where g(t ) is the Hilbert transform of g(f). Thus, the bandwidth of the SSB 
signal is half that of the DSB signal. 

The digital PAM signal is also appropriate for transmission over a channel 
that does not require carrier modulation. In this case, the signal waveform .may 
be simply represented as 

s m (t) = A„g{t), m = 1, 2, . . . , M (4-3-10) 

This is now called a baseband signal. For example a four-amplitude level 
baseband PAM signal is illustrated in Fig. 4-3-2(a). The carrier-modulated 
version of the signal is shown in Fig. 4-3-2(6). 

In the special case of M = 2 signals, the binary PAM waveforms have the 
special property that 

*i(0 = “*z(0 


Baseband and bandpass PAM signals. 

Signal 

amplitude 



Data: M 10 00 01 II 00 

(a) Baseband PAM signal 



(b) Bandpass PAM signal 
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Hence, these two signals have the same energy and a cross-correlation 
coefficient of - 1 . Such signals are called antipodal. 


Phase-Modulated Signals In digital phase modulation, the M signal 
waveforms are represented as 


s m ( t) = Re [g(t)e J2 * l "‘ ' m = 1,2 M, 


= g(l) cos 


2nf,t + 77 ~ ^ 


g(t) cos ~(m - l)cos2n£r -g(/)sin^(m - \)s\n2nf c t 
M M 


(4-3-11) 


where g(r) is the signal pulse shape and 9 m = 2;r(m - 1 )/M, m = 1, 2, . . . , M, 
are the M possible phases of the carrier that convey the transmitted 
information. Digital phase modulation is usually called phase-shift keying 
(PSK). 

We note that these signal waveforms have equal energy, i.e., 

*=f 5 2 ™(0 dt 

i r T 

35 = g 2 (t)dt^ l 2 % (4-3-12) 

Z 


Furthermore, the signal waveforms may be represented as a linear combination 
of two-orthc^normal signal waveforms, fft) and f 2 (t), i.e.. 


s m (t) = s mi Mt) + s m2 f 2 {t) 

where 

/i(0= g(t) cos Ittfj 
/z(0 - - g(0 sin 2n f c i 

and the two-dimensional vectors s m = [s ral s„, 2 ] are given by 

. In 




% g 2 n / ^ . in 

-^eos — (m-!) -J sm — (m - !)J, m = l,2, ...,Af 


(4-3-13) 

(4-3-14) 

(4-3-15) 


(4-3-16) 
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FIGURE 4-3-3 


Signal space diagrams for PSK. signals. 
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M = 8 


Signal space diagrams for M = 2, 4, and 8 are shown in Fig. 4-3-3. We note that 
M ~ 2 corresponds to one-dimensional signals, which are identical to binary 
PAM signals. 

As is the case of PAM, the mapping or assignment of k information bits to 
the M - 2 k possible phases may be done in a number of ways. The preferred 
assignment is Gray encoding, so that the most likely errors caused by noise will 
result in a single bit error in the k-bit symbol. 

The Euclidean distance between signal points is 

— $ | 

ntrt Pm I 


(4-3-17) 

The minimum Euclidean distance corresponds to the case in which \m- n\= 1 , 
i.e., adjacent signal phases. In this case. 




in 

- cos — (m - n ) 

M 


dmL ~ \ I <^ij( 1 — COS ~~ 

\ ' ' M 


*) 


(4-3-18) 


Quadrature Amplitude Modulation The bandwidth efficiency of PAM/ 
SSB can also be obtained by simultaneously impressing two separate k-bit 
symbols from the information sequence {n,,} on two quadrature carriers 
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FIGURE 4-J-4 


cos 2 itf .t and sin 2 itfj. The resulting modulation technique is called quadrature 
PAM or QAM. and the corresponding signal waveforms may be expressed as 

*,„(/) = Re [</*„„ +jA„„)f’(ty : '" , l m =1,2 M, 

= 1 cos Inft - A, m g{t) sin 2nf t i (4-3-19) 

where A„„ and A,,,, are the information-bearing signal amplitudes of the 
quadrature carriers and #(r) is the signal pulse. 

Alternatively, the QAM signal waveforms may be expressed as 

= K»£(0 COS {2k f t + 9,„ ) (4-3-20) 

where V m - VA;, lr + and 6,„ - tan 1 {A m jA„ H ). From this expression, it is 
apparent that the QAM signal waveforms may be viewed as combined 
amplitude and phase modulation. 

In fact, we may select any combination of M r level PAM and /W 2 -phase PSK 
to construct an M = combined PAM-PSK signal constellation. If 

A/, - 2" and M 2 = 2"\ the combined PAM-PSK signal constellation results in 
the simultaneous transmission of m - n = log M t M 2 binary digits occurring at a 
symbol rate R/(m+n). Examples of signal space diagrams for combined 
PAM-PSK are shown in Fig. 4-3-4, for M = 8 and M = 16. 

As in the case of PSK signals, the QAM signal waveforms may bp 
represented as a linear combination of two orthonormal signal waveforms. /,(r) 
and f 2 (r), i.e., 

s m (t) 

where 

/i(0 = 


/ 2 (0 = 


Examples of combined PAM-PSK 
signal space diagrams. 


= v„,,/,(r) +S„, 2 / 2 (f) 


(4-3-21) 


j 2 

-y^g(f)cos Infj 
12 

~ ^~g(f)sin 24 f 


(4-3-22) 



M= 8 



M= 16 
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FIGURE 4-3-5 


Several signal space diagrams for rectangular 
OAM. 


f— T 




M ~ 64 

• ■* -f 


m=?: 

■ » i 

M- 16 '\ 


M = 8 

■ 4 


♦ f 


and 

l ^/m2] 

= [A,„ c VW g A m V$%) 

£ k is the energy of the signal pulse g(f). 

The Euclidean distance between any pair of signal vectors is 

= |s,„ - s„j 

= -,4~ j 5 ] 


(4-3-23) 


(4-3-24) 


In the special case where the signal amplitudes takes the set of discrete values 
{(2m - 1 - M)d, m = 1, 2, . . . , M}, the signal space diagram is rectangular, as 
shown in Fig. 4-3-5. In this case, the Euclidean distance between adjacent 
points, i.e., the minimum distance, is 


( T e > =dV 2W 

min *■* * 


(4-3-25) 


which is the same result as for PAM. 


Multidimensional Signals It is apparent from the discussion above that the 
digital modulation of the carrier amplitude and phase allows us to construct 
signal waveforms that correspond to two-dimensional vectors and signal space 
diagrams. If we wish to construct signal waveforms corresponding to higher- 
dimensional vectors, we may use either the time domain or the frequency 
domain or both in order to increase the number of dimensions. 

Suppose we have /V-dimensional signal vectors. For any N, we may 
subdivide a time interval of length T| = NT into N subintervals of length 
T-TJN. In each subinterval of length T, we may use binary PAM (a 
one-dimensional signal) to transmit an element of the jV-dimensional signal 
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FIGURE 4-3*6 Subdivision of time and frequency axes into distinct slots. 


/ 



vector. Thus, the N time slots are used to transmit the A-dimensional signal 
vector. If N is even, a time slot of length T may be used to simultaneously 
transmit two components of the A'-dimensional vector by modulating the 
amplitude of quadrature carriers independently by the corresponding 
components. In this manner, the A'-dimensional signal vector is transmitted in 
\i NT seconds (2 A' time slots). 

Alternatively, a frequency band of width N A / may be subdivided into N 
frequency slots each of width A f An A'-dimensional signal vector can be 
transmitted over the channel by simultaneously modulating the amplitude of N 
carriers, one in each of the N frequency slots. Care must be taken to provide 
sufficient frequency separation A / between successive carriers so that there is 
no cross talk interference among the signals on the N carriers. If quadrature 
carriers are used in each frequency slot, the A'-dimensional vector (even AO 
may be transmitted in 5 /V frequency slots, thus reducing the channel bandwidth 
utilization by a factor of 2 . 

More generally, we may use both the time and frequency domains jointly to 
transmit an A'-dimensional signal vector. For example, Fig. 4 3-6 illustrates a 
subdivision of the time and frequency axes into 12 slots. Thus, an A = 12- 
dimensional signal vector may be transmitted by PAM or an A = 24- 
dimensional signal vector may be transmitted by use of two quadrature carriers 
(QAM) in each slot. 

Orthogonal Multidimensional Signals As a special case of the construction 
of multidimensional signals, let us consider the construction of M equal-energy 
orthogonal signal waveforms that differ in frequency, and are represented as 

s,„(r) = Re [s /w (f)e /2 ‘*'], m = 1,2 M, O^t^T 

cos [2 nft + 2nm A f f] (4-3-26) 

where the equivalent lowpass signal waveforms are defined as 

s lm (t) = J y e > 2 *'” A ", m = 1 , 2 M, O^t^T (4-3-27) 

This type of frequency modulation is called frequency -shift keying (FSK). 
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FIGURE 4-3-7 


These waveforms are characterized as having equal energy and cross- 
correlation coefficients 


Pkm 


The real part of p km is 


r 


2%/T 
2 % 

sin nT{m - k ) A/ 


nT(m - k) A / 


fa 

gjKTim-k) A/ 


(4-3-28) 


Pr 


- Re (P*»i) ~ 


sin \nT{m — k) A f] 
nT(m - k) A f 


cos [itT(m - k) A/) 


sin [2 nT(m - k) A/] 
2%T{m - k) A/ 


(4-3-29) 


First, we observe that Re(p im ) = 0 when A/ = 1/2T and m ¥■ k. Since 
\m-k\ = i corresponds to adjacent frequency slots, A f = 1/2 T represents the 
minimum frequency separation between adjacent signals for orthogonality of 
the M signals. Plots of Re (p km ) versus A/ and | p km \ versus A/ are shown in Fig. 
4-3-7. Note that |p* w j = 0 for multiples of 1/7" whereas Re(p*'„,) = 0 for 
multiples of 1/2 T. 


p. 




of frequency separation for FSK signals. 


(f» 
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FIGURE 4-3-8 


/j(rt 



Orthogonal signals for M = N = 3 and /,(ij 
M =N -2. M = N -i 


ti» 



For the case in which A/ = 1/2 T, the M FSK signals are equivalent to the 
jY-dimensional vectors 

s, = [Vg 0 0 ... 0 0] 

So = [o Vi o ... o oi 

(4-3-30) 

s* = [0 0 0 ... 0, Vij 

where N = M. The distance between pairs of signals is 

4m = for all m, k 

which is also the minimum distance. Figure 4-3-8 illustrates 
diagram for M = IV = 2 and M = N = 3. 

Biorthogonal Signals A set of W biorthogonal signals can be constructed 
from orthogonal signals by simply including the negatives of the orthogonal 
signals. Thus, we require N = \M dimensions for the construction of a set of M 
biorthogonal signals. Figure 4-3-9 illustrates the biorthogonal signals for Af = 4 
and 6. 

We note that the correlation between any pair of waveforms is either 
p r - - 1 or 0. The corresponding distances are d = or V2& , with the latter 
being the minimum distance. 


(4-3-31) 
the signal space 


FIGURE 4-3-9 Signal space diagrams for M = 4 and 
M =6 biorthogonal signals. 


ti>) 




ti<) 


Af=4 


M=6 
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Simplex Signals Suppose we have a set of M orthogonal waveforms 
or, equivalently, their vector representation {s m }. Their mean is 

s = (4-3-32) 

M m = i 


Now, let us construct another set of M signals by subtracting the mean from 
each of the M orthogonal signals. Thus, 

s^,, = s,„ - s, m = 1,2, .... M (4-3-33) 


The effect of the subtraction is to translate the origin of the m orthogonal 
signals to the point s. 

The resulting signal waveforms are called simplex signals and have the 
following properties. Fir-st, the energy per waveform is 

is;„i 2 = is„, - sp 


M M 


-K'-jf) < 4 - 3 ' 34 > 


Second, the cross-correlation of any pair of signals is 


Re(p„„,) 


* jn 

|s»,i I®,', I 

-1/M 

1 - mm 


1 

M - 1 


(4-3-35) 


for all m, n. Hence, the set of simplex waveforms is equally correlated and 
requires less energy, by the factor 1-1 /M, than the set of orthogonal 
waveforms. Since only the origin was translated, the distance between any pair 
of signal points ; s maintained at d = V2g, which is the same as the distance 
between any pair of orthogonal signals. 

Figure 4-3-10 illustrates the simplex signals for M = 2, 3, and 4. Note that 
the signal dimensionality is N= M - 1. 


Signal Waveforms from Binary Codes A set of M signaling waveforms 
can be generated from a set of M binary code words of the form 


C,i c,„2 - . . c ;>(i v], m 1.2,... , M 

where c m , = 0 or 1 for all m and j. Each component of a code word is 
into an elementary binary PSK waveform as follows: 



(0=£ r =£ 7;.) 
( 0 *** 7 ;) 


(4-3-36) 

mapped 


(4-3-37) 


where T, - TIN and = tf/N. Thus, the M code words { C,„ } are mapped into 
a set of M waveforms {s,„(f)}. 



FIGURE 4-3-10 


FIGURE 4-3-11 
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Signal space diagrams for M - ary simplex 
signals. 



The waveforms can be represented in vector form as 

%n = [^1 S m2 ... m = l, (4-3-38) 

where s mj = ±V%Jn for all m and N is called the block length of the code, 
and it is also the dimension of the M waveforms. 

We note that there are 2 V possible waveforms that can be constructed from 
the 2 n possible binary code words. We may select a subset of M < 2 s signal 
waveforms for transmission of the information. We also observe that the 2 s 
possible signal points correspond to the vertices of an N-dimensional hyper- 
cube with its center at the origin. Figure 4-3-11 illustrates the signal points in 
N = 2 and 3 dimensions. 


Signal space diagrams for signals 
generated from binary codes. 
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m 
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Each of the M waveforms has energy The cross-correlation between any 
pair of waveforms depends on how we select the M waveforms from the 2 V 
possible waveforms. This topic is treated in Chapter 7. Clearly, any adjacent 
signal points have a cross-correlation coefficient 


Pr = 


g(l - 2 IN) 

% 


N-2 

N 


(4-3-39) 


and a corresponding distance of 

d (e) = V2£(I - Pr ) 

= V4 ft N (4-3-40) 

This concludes our discussion of memoryless modulation signals. 


4-3-2 Linear Modulation with Memory 

The modulation signals introduced in the previous section were classified as 
memoryless, because there was no dependence between signals transmitted in 
non-overlapping symbol intervals. In this section, we present some modulation 
signals in which there is dependence between the signals transmitted in 
successive symbol intervals. This signal dependence is usually introduced for 
the purpose of shaping the spectrum of the transmitted signal so that it 
matches the spectral characteristics of the channel. Signal dependence between 
signals transmitted in different signal intervals is generally accomplished by- 
encoding the data sequence at the input to the modulator by means of a 
modulation code, as described in Chapter 9. 

In this section, we shall present examples of modulation signals with 
memory and characterize their memory in terms of Markov chains. We shall 
confine our treatment to baseband signals. The generalization to bandpass 
signals is relatively straightforward. 

Figure 4-3-12 illustrates three different baseband signals and the corres- 
ponding data sequence. The first signal, called NRZ, is the simplest. The 
binary information digit 1 is represented by a rectangular pulse of polarity A 
and the binary digit zero is represented by a rectangular pulse of polarity -A. 


NRZ 
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Delay 
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(Miller code) 

FIGURE 4 - 3*12 Baseband signals. Daia 
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FIGURE 4*3*12 Baseband signals. 
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Hence, the NRZ modulation is memoryless and is equivalent to a binary PAM 
or a binary PSK signal in a carrier-modulated system. 

The NRZI signal is different from the NRZ signal in that transitions from 
one amplitude level to another occur only when a 1 is transmitted. The 
amplitude level remains unchanged when a zero is transmitted. This type of 
signal encoding is called differential encoding. The encoding operation is 
described mathematically by the relation 


b k = a k ®b k 


(4-3-41 ) 


where {a*} is the binary information sequence into the encoder, {i> t } is the 
output sequence of the encoder, and © denotes addition modulo 2. When 
b k = 1, the transmitted waveform is a rectangular pulse of amplitude A. and 
when b k = 0, the transmitted waveform is a rectangular pulse of amplitude -A 
Hence, the output of the encoder is mapped into one of two waveforms in 
exactly the same manner as for the NRZ signal. 

The differential encoding operation introduces memory in the signal. The 
combination of the encoder and the modulator operations may be represented 
by a state diagram (a Markov chain) as shown in Fig. 4-3-13. The state diagram 
may be described by two transition matrices corresponding to the two possible 
input bits {0, 1). We note that when a k = 0, the encoder stays in the same state. 
Hence, the state transition matrix for a zero is simply 


T, = 


'1 

.0 


O' 

1 _ 


(4-3-42) 


where /„ = 1 if a. results in a transition from state i to state j, i = i, 2, and ; = 1 . 
2; otherwise, t, t - 0. Similarly, the state transition matrix for a k = 1 is 

^ [0 11 

T 2 ~ i a (4-3-43) 


Thus, these two state transition matrices characterize the NRZI signal. 

Another way to display the memory introduced by the precoding operation 
is by means of a trellis diagram. The trellis diagram for the NRZI signal is 


FIGURE 4-3-13 


Stale diagram for the NRZI signal. 
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FIGURE 4-5-14 


FIGURE 4-3-15 


The trellis diagram for the NRZI signal. 



illustrated in Fig. 4-3-14. The trellis provides exactly the same information 
concerning the signal dependence as the state diagram, but also depicts a time 
evolution of the state transitions. 

The signal generated by delay modulation also has memory. As shown in 
Chapter 9, delay modulation is equivalent to encoding the data sequence by a 
run-length-limited code called a Miller code and using NRZI to transmit the 
encoded data. This type of digital modulation has been used extensively for 
digital magnetic recording and in carrier modulation systems employing binary 
PSK, The signal may be described by a state diagram that has four states as 
shown in Fig. 4-3-15(a). There are two elementary waveforms 5,(r) and s 2 (t) 
and their negatives -s t (t) and -s 2 (0> which are used for transmitting the 
binary information. These waveforms are illustrated in Fig. 4-3- 15(b). The 
mapping from bits to corresponding waveforms is illustrated in the state 
diagram. The state transition matrices that characterize the memory of this 
encoding and modulation method are easily obtained from the state diagram in 
Fig. 4-3-15. When a k = 0, we have 


"0 0 0 r 
0 0 0 1 
10 0 0 
_1 0 0 0 . 


(4-3-44) 


State diagram (a) and basic waveforms (f>) for delay modulated (Miller-encoded) signal. 
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and when a k = 1, the transition matrix is 

"0 10 0 " 
0 0 10 
Tz “ 0 1 0 0 
_0 0 1 0 . 


(4-3-45) 


Thus, these two 4x4 state transition matrices characterize the state diagram 
for the Miller-encoded signal. 

Modulation techniques with memory such as NRZI and Miller coding are 
generally characterized by a K -state Markov chain with stationary state 
probabilities {p„ i = 1,2, . . . , K] and transition probabilities {p,,, i, j = 
1,2,..., Kj. Associated with each transition is a signal waveform s ; (r), 
j = 1, 2, . . . , K. Thus, the transition probability p tj denotes the probability that 
signal waveform s y (f) is transmitted in a given signaling interval after the 
transmission of the signal waveform s,(t) in the previous signaling interval. The 
transition probabilities may be arranged in matrix form as 


P ii Pn ■ • • P\k 
P l\ Pl2 ■ ■ ■ P2K 

-Pk\ Pk2 ••• Pkk- 


(4-3-46) 


where P is called the transition probability matrix. 

The transition probability matrix is easily obtained from the transition 
matrices {T,} and the corresponding probabilities of occurrence of the input 
bits (or, equivalently, the stationary state transition probabilities (pj). The 
pene.al relationship may be expressed as 

2 

p = S gj, (4-3-47) 

1 


where q x = P{a k = 0) and q 2 - P(a k = 1). 

For the NRZI signal with equal state probabilities P\-p 2 = \ and transition 
matrices given by (4-3-42) and (4-3-43), the transition probability matrix is 


P = 



(4-3-48) 


Similarly, the transition probability matrix for the Miller-coded signal with 
equally likely symbols (q x =q-i = \ or, equivalently, p x = p 2 = p 3 = p A = J) is 


P = 


"o \ o r 
oo|! 
n o o 
A o j o. 


(4-3-49) 


The transition probability matrix is useful in the determination of the spectral 



190 DIGITAL COMMUNICATIONS 


characteristics of digital modulation techniques with memory, as we shall 
observe in Section 4-4. 


4-3*3 Nonlinear Modulation Methods with Memory 

In this section, we consider a class of digital modulation methods in which the 
phase of the signal is constrained to be continuous. This constraint results in a 
phase or frequency modulator that has memory. The modulation method is 
also nonlinear. 

Continuous-Phase FSK (CPFSK) A conventional FSK signal is generated 
by shifting the carrier by an amount f n = \ tsfl n , I„ = ±1, ±3, . . . , ±{M - 1), to 
reflect the digital information that is being transmitted. This type of FSK signal 
was described in Section 4-3-1, and it is memoryless. The switching from one 
frequency to another may be accomplished by having M = 2* separate 
oscillators tuned to the desired frequencies and selecting one of the M 
frequencies according to the particular k- bit symbol that is to be transmitted in 
a signal interval of duration T = k/R seconds. However, such abrupt switching 
from one oscillator output to another in successive signaling intervals results in 
relatively large spectral side lobes outside of the main spectral band of the 
signal and, consequently, this method requires a large frequency band for 
transmission of the signal. 

To avoid the use of signals having large spectral side lobes, the information- 
bearing signal frequency modulates a single carrier whose frequency is changed 
continuously. The resulting frequency-modulated signal is phase-continuous 
and, hence, it is called continuous -phase FSK (CPFSK). This type of FSK 
signal has memory because the phase of the carrier is constrained to be 
continuous. 

In order to represent a CPFSK signal, we begin with a PAM signal 


diO^'Zlgii-nT) (4-3-50) 

n 


where {/„} denotes the sequence of amplitudes obtained by mapping A: -bit 
blocks of binary digits from the information sequence {a n } into the amplitude 
levels ± 1, ±3, .... ±{M — 1) andg(f) is a rectangular pulse of amplitude 1/2T 
and duration T seconds. The signal dU) is used to frequency-modulate the 
carrier. Consequently, the equivalent lowpass waveform v(/) is expressed as 


v(t) = 



(4-3-51) 


where f d is the peak frequency deviation and 4> 0 is the initial phase of the 
carrier. 

The carrier-modulated signal corresponding to (4-3-51) may be expressed as 
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where $(/; I) represents the time-varying phase of the carrier, which is defined 

as 

I) = 4xTf, j d{r)dr 

= 4*771, [ [s l„g(T ~ nT)] dr (4-3-53) 

Note that, although d(t ) contains discontinuities, the integral of d{t) is 
continuous. Hence, we have a continuous-phase signal. The phase of the 
carrier in the interval nT t ^ (n + 1 )7' is determined by integrating (4-3-53). 
Thus, 

4>(n I) = 2 nf,T 2 A + ~ nT)I„ 

* = - * 

= 6„+2 Khl„q(( - nT) (4-3-54) 

where h, 0„, and q(t) are defined as 

h=2f d T (4-3-55) 

n - I 

e„ = 7ih 2 A (4-3-56) 

k = 

'o (/<0) 

9(0 = ' f/27* (0^r«f) (4-3-57) 

.3 (^>7-) 

We observe that 0„ represents the accumulation (memory) of all symbols up to 
time ( n - 1 )T. The parameter h is called the modulation index. 

Continuous-Phase Modulation (CPM) When expressed in the form of 
(4-3-54), CF*FSK becomes a special case of a general class of continuous-phase 
modulated (CPM) signals in which the carrier phase is 

d>(r;I) = 2ff X I k h k q(t-kT), nT^t^(n + l)T (4-3-58) 

where {A} is the sequence of M - ary information symbols selected from the 
alphabet ±1, ±3, . . . , ±{M ~ 1), { h k } is a sequence of modulation indices, and 
q(t) is some normalized waveform shape. 

When h k = h for all k, the modulation index is fixed for all symbols. When 
the modulation index varies from one symbol to another, the CPM signal is 
called multi-h. In such a case, the {h k } are made to vary in a cyclic manner 
through a set of indices. 

The waveform q(t) may be represented in general as the integral of some 
pulse g(t), i.e., 

<7(0= f g(r)dr 
J 0 


(4-3-59) 
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FIGURE 4-3-16 


*(ll 





th ) 

Pulse shapes for full response CPM (a, b) and partial response CPM (c, d). 


If g(t) = 0 for t > T, the CPM signal is called full response CPM. If g(t) ^ 0 for 
t > T. the modulated signal is called partial response CPM. Figure 4-3-16 
illustrates several pulse shapes for g(r), and the corresponding q((). It is 
apparent that an infinite variety of CPM signals can be generated by choosing 
different pulse shapes g(f) and by varying the modulation index h and the 
alphabet size M. 

It is instructive to sketch the set of phase trajectories I) generated by all 
possible values of the information sequence {/„}. For example, in the case of 
CPFSK with binary symbols /„ = ±1, the set of phase trajectories beginning at 
time / = 0 is shown in Fig. 4-3-17. For comparison, the phase trajectories for 
quaternary CPFSK are illustrated in Fig. 4-3-18. These phase diagrams are 
called phase trees. We observe that the phase trees for CPFSK are piecewise 
linear as a consequence of the fact that the pulse g(t ) is rectangular. Smoother 
phase trajectories and phase trees are obtained by using pulses that do not 
contain discontinuities, such as the class of raised cosine pulses. For example, a 
phase trajectory generated by the sequence (1, -1, — 1, -1, 1, 1, -1, 1) for a 
partial response, raised cosine pulse of length 37 is illustrated in Fig. 4-3-19. 
For comparison, the corresponding phase trajectory generated by CPFSK is 
also shown. 

The phase trees shown in these figures grow with time. However, the phase 
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FIGURE 4-3-18 



r 2T IT AT 

Phase trajectory for quaternary C'PFSK. 


of the carrier is unique onJy in the range from tf> = 0 to <f> = 2n or, equivalently, 
from <t> = - n to <f> - it. When the phase trajectories are plotted modulo 2 it. say 
in the range ( -it , it), the phase tree collapses into a structure called a phase 
trellis. To properly view the phase trellis diagram, we may plot the two 
quadrature components v, (t: I) = cos <f>{t: I) and y s (f; I) = sin I) as 
functions of time. Thus, we generate a three-dimensional plot in which the 
quadrature components .v, and .v, appear on the surface of a cylinder of unit 
radius. For example. Fig. 4-3-20 illustrates the phase trellis or phase cylinder 
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FIGURE 4-3-19 


FIGURE 4-3-20 



Phase trajectories for binary CPFSK (dashed) and binary, partial response CPM based on raised 
cosine pulse of length 3 T (solid). [From Snndberg (1986), ©1986 IEEE ] 


obtained with binary modulation, a modulation index h = and a raised 
cosine pulse of length 3 T. 

Simpler representations for the phase trajectories can be obtained by 
displaying only the terminal values of the signal phase at the time instants 
t - nT. In this case, we restrict the modulation index of the CPM signal to be 
rational. In particular, let us assume that h~mlp, where m and p are 
relatively prime integers. Then, a full response CPM signal at the time instants 
t = nT will have the terminal. phase states 


Km 2 nm (p - Ijjcm 

f f ■ • * f I 

P P P> 

when m is even and 

nm 2nm (2 p - 1 )nm 

P ’ P P 




(4-3-60) 


(4-3-61) 


when m is odd. Hence, there are p terminal phase states when m is even and 
2 p states when m is odd. On the other hand, when the pulse shape extends 
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FIGURE 4-3-21 



Slate trellis for binary CPFSK with h = 


over L symbol intervals (partial response CPM), the number of phase states 
may increase up to a maximum of S„ where 


5 ,= 


P M L ~ X 
2 pM L ~' 


(even m) 
(odd m) 


(4-3-62) 


where M is the alphabet size. For example, the binary CPFSK signal (full 
response, rectangular pulse) with h = has 5, = 4 (terminal) phase states. The 
state trellis for this signal is illustrated in Fig. 4-3-21. We emphasize that the 
phase transitions from one state to another are not true phase trajectories. 
They represent phase transitions for the (terminal) states at the time instants 
t ~nT. 

An alternative representation to the state trellis is the state diagram, which 
also illustrates the state transitions at the time instants t = nT. This is an even 
more compact representation of the CPM signal characteristics. Only the 
possible (terminal) phase states and their transitions are displayed in the state 
diagram. Time does .lot appear explicitly as a variable. For example, the state 
diagram for the CPFSK signal with h = \ is shown in Fig. 4-3-22. 


Minimum-Shift Keying (MSK) MSK is a special form of binary CPFSK 



FIGURE 4-3-22 State diagram for binary CPFSK with h — j 
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(and, therefore, CPM) in which the modulation index h — The phase of the 
carrier in the interval nT =er =£ (a + 1)7" is 


2 L + xi n q(t - nT) 

A = - * 



ft tj J’v 

*«- + !*/,( r ), nT^t^(n + \)T 

(4-3-63) 

and the modulated carrier signal is 


*•* 

If 

* 

n 

O 

\l7Cf c t +0„-h T )] 


= A cos 

2?r(x + “/„)r ~ \nnl„ + 6„ j, nT =£ t (n + 1)7 

(4-3-64) 


The expression (4-3-64) indicates that the binary CPFSK signal can be 
expressed as a sinusoid having one of two possible frequencies in the interval 
nT t *£ (n + 1)T If we define these frequencies as 


f =/c - 


4 T 


/ 2 =/r+ — 


(4-3-65) 


then the binary CPFSK signal given by (4-3-64) may be written in the form 

s /(.0 = A cos [2jtft + 8 n + \nn (- 1 )' _, J, / = 1, 2 (4-3-66) 

The frequency separation Af -f 2 —f = 1/2T. Recall that Af = 1/27 is the 
minimum frequency separation that is necessary to ensure the orthogonality of 
the signals ^(r) and s 2 (t) over a signaling interval of length T. This explains 
why binary CPFSK with h = \ is called minimum-shift keying (MSK). The 
phase in the nth signaling interval is the phase state of the signal that results in 
phase continuity between adjacent intervals. 

MSK may also be represented as a form of four-phase PSK. Specifically, we 
may express the equivalent lowpass digitally modulated signal in the form (see 
Problem 4-14) 


oc 

v (0 = 2 Uing(t ~ 2nT) - jl 2 n+ t g{t — 2nT- 7")] 

n= — ac 

where g(f) is a sinusoidal pulse defined as 

r 


g(0 = 


sin 


m 

2T 


lo 


(0«r«2T) 

(otherwise) 


(4-3-67) 


(4-3-68) 
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FIGURE 4-3-23 


Thus, this type of signal is viewed as a four-phase PSK signal in which the 
pulse shape is one-half cycle of a ^sinusoid. The even-numbered binary-valued 
(±1) symbols {7„} of the information sequence {/„} are transmitted via the 
cosine of. the carrier, while the odd-numbered symbols {/ 2 , l+l } are transmitted 
via the sine of the carrier. The transmission rate on the two orthogonal carrier 
components is 1/27 bits per second so that the combined transmission rate is 
1/7 bits/s. Note that the bit transitions on the sine and cosine carrier 
components are staggered or offset in time by 7 seconds. For this reason, the 
signal 


*(0“*{[ S lingi 1 ~ 2n7) j cos 2itf c t 

+ [ 2 hn*\g{t ~ 2n T - T) j sin 2/tfrj (4-3-69) 

is called offset quadrature PSK ( OQPSK ) or staggered quadrature PSK 
(SQPSK). 

Figure 4-3-23 illustrates the representation of the MSK signals as two 
staggered quadrature-modulated binary PSK signals. The corresponding sum 
of the two quadrature signals is a constant amplitude, frequency-modulated 
signal. 

It is also interesting to compare the waveforms for MSK with offset QPSK 
in which the pulse g(t) is rectangular for 0^/«27, and with conventional 



Representation of MSK signal as a form of two 
staggered binary PSK signals, each with a 
sinusoidal envelope. 



0 IT AT tiT 8 T 

|e>) Quadrature signal component 



0 T 2T IT AT 5T 6 T IT 
<c) MSK signal [sum of (til and [b)\ 
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FIGURE 4-)-24 


-90° phase shift +90° phase shift 



-90° phase shift +90° phase shift 



Signal waveforms for (a) MSK, (b) offset QPSK (rectangular pulse), and (c) conventional QPSK 
(rectangular pulse). [From Gronemeyer and McBride {1976); © 1976 IEEE.] 


quadrature (four-phase) PSK (QPSK) in which the pulse g(r) is rectangular for 
0«r *527. Clearly, all three of the modulation methods result in identical data 
rates. The MSK signal has continuous phase. The offset QPSK signal with a 
rectangular pulse is basically two binary PSK signals for which the phase 
transitions are staggered in time by T seconds. Thus, the signal contains phase 
jumps of ±90° that may occur as often as every T seconds. On the other hand, 
the conventional four-phase PSK signal with constant amplitude will contain 
phase jumps of ±180° or ±90° every IT seconds. An illustration of these three 
signal types is given in Fig. 4-3-24. 

Signal Space Diagrams for CPM In general, continuous-phase signals 
cannot be represented by discrete points in signal space as in the case of PAM, 
PSK, and QAM, because the phase of the carrier is time-variant. Instead, a 
continuous-phase signal is described by the various paths or trajectories from 
one phase state to another. For a constant-amplitude CPM signal, the various 
trajectories form a circle. 
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FIGURE 4-3-25 


Signal space diagram for CPFSK.. 



For example, Fig. 4-3-25 illustrates the signal space (phase trajectory) 
diagram for CPFSK signals with h = h = h = 2 , and h = j. The beginning 
and ending points of these phase trajectories are marked in the figure by dots. 
Note that the length of the phase trajectory increases with an increase in h. An 
increase in h also results in an increase of the signal bandwidth, as 
demonstrated in the following section. 

Multiamplitude CPM Multiamplitude CPM is a generalization of ordinary 
CPM in which the signal amplitude is allowed to vary over a set of amplitude 
values while the phase of the signal is constrained to be continuous. For 
example, let us consider a two-amplitude CPFSK signal, which may be 
represented as 

s(/) = 2 A cos [2;r/ c f + I)j + A cos [2 rtf c t + J)j (4-3-70) 

where 

<f> 2 {t,l) = nh 2 l k + nhln ^ ~ nT) , nT^t^(n + \)T (4-3-71) 

^.(r: J) = zr/i £ J k + nT ^t^(n + \)T (4-3-72) 

* = - =c 1 

The information is conveyed by the symbol sequences {/„} and {/„}, which are 
related to two independent binary information sequences {a„} and {b„} that 
take values {0, 1}, We observe that the signal in (4-3-70) is a superposition of 
two CPFSK signals of different amplitude. However, the sequences {/„} and 
{/„} are not statistically independent, but are constrained in order to achieve 
phase continuity in the superposition of the two components. 

To elaborate, let us consider the case where h = so that we have the 
superposition of two MSK signals. At the symbol transition points, the two 
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TABLE 4-3-1 



b„ 


J. 

Amplitude- phase relations 

0 

0 

-1 

-1 

Amplitude is constant; phase decreases 

0 

1 

-1 

1 

Amplitude changes; phase decreases 

1 

0 

1 

1 

Amplitude is constant; phase increases 

I 

1 

1 

-1 

Amplitude changes; phase increases 


amplitude components are either in phase or 180° out of phase. The phase 
change in the signal is determined by the phase of the larger amplitude 
component, while the amplitude change is determined by the smaller 
component. Thus, the smaller component is constrained such that at the start 
and end of each symbol interval, it is either in phase or 180° out of phase with 
the larger component, independent of its phase. Under this constraint, the 
symbol sequences {/„} and {/„} may be expressed as 


/„ = 2a„ - 1 

J n = I n (\-2b„) = l n (\ -^) 


(4-3-73) 


These relationships are summarized in Table 4-3-1. 

As a generalization, a multiamplitude CPFSK signal with n components 
may be expressed as 


<v-i 


s(t) = 2"-' cos \2nf c i + <Mr; I)] + X 2 m ' 1 cos [2 nf c t + J„)] (4-3-74) 

m = 1 

where 

t ~ Fl 7" 1 ^ ~ J 

d>v(t; I) = nhl n — + nh £ h, nT^t^(n + l)T (4-3-75) 


and 


<M/; -U = i n x[h + UJ mn + D] 

n — 1 

+ £ Klk[h + ttJm k + 1)L nT^t^(n + l)T (4-3-76) 

k = <*> 

The sequences {/„} and {/„,„} are statistically independent, binary-valued 
sequences that take values from the set {1, -1}. 

From (4-3-75) and (4-3-76), we observe that each component in the sum 
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FIGURE 4-3-26 




Signai space diagrams for two-component CPFSK. 


will be either in phase or 180° out of phase with the largest component at the 
end of the nth symbol interval, i.e., at t = (n + \)T. Thus, the signal states are 
specified by an amplitude level from the set of amplitudes {1,3,5, . . . ,2 N - 1} 
and a phase level from the set {0, ltd, 2n6, ... ,2 k- 7th}. The phase constraint 
is required to maintain the phase continuity of the CPM signal. 

Figure 4-3-26 illustrates the signal space diagrams for two-amplitude [N - 2) 
CPFSK with h = i 3, 2, and 3. The signal space diagrams for three-component 
(N = 3) CPFSK are shown in Fig. 4-3-27. In this case, there are four amplitude 
levels. The number of states depends on the modulation index h as well as N. 
Note that the beginning and ending points of the phase trajectories are marked 
by dots. 

Additional multiamplitude CPM signal formats may be obtained by using 
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FIGURE 4-3-27 



h = 


2 

T 



Signal space diagrams for three-component CPFSK. 


pulse shapes other than rectangular, as well as signal pulses that span more 
than one symbol (partial response). 


4-4 SPECTRAL CHARACTERISTICS OF DIGITALLY 
MODULATED SIGNALS 

In most digital communications systems, the available channel bandwidth is 
limited. Consequently, the system designer must consider the constraints 
imposed by the channel bandwidth limitation in the selection of the modula- 
tion technique used to transmit the information. For this reason, it is important 
for us to determine the spectral content of the digitally modulated signals 
described in Section 4-3. 

Since the information sequence is random, a digitally modulated signal is a 
stochastic process. We are interested in determining the power density 
spectrum of such a process. From the power density spectrum, we can 
determine the channel bandwidth required to transmit the information-bearing 
signal. Below, we first derive the spectral characteristics of the class of linearly 
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modulated signals. Then, we consider the nonlinear CPFSK, CPM, and 
baseband modulated signals with memory. 


4-4-1 Power Spectra of Linearly Modulated Signals 

Beginning with the form 

j(r) = Re [v{()e j2 ’ ,f , \ 


which relates the bandpass signal s(r) to the equivalent lowpass signal u(t), we 
may express the autocorrelation function of s(r) as 

<A„(r) = Re [^ u „(r)e /2 ’ r/T ] (4-4-1) 

where ^^(t) is the autocorrelation function of the equivalent lowpass signal 
u(r). The Fourier transform of (4-4-1) yields the desired expression for the 
power density spectrum 4>„.(/) in the form 

<M/) - -2 [«M/ - fc) + <M"/ - /<■)] (4-4-2) 

where 0> vv (f) is the power density spectrum of u(f). It suffices to determine the 
autocorrelation function and the power density spectrum of the equivalent 
lowpass signal v(t). 

First we consider the linear digital modulation methods for which u(f) is 
represented in the general form 

v(0= 2 Lg(t-nT) (4-4-3) 

i|= — oo 

where the transmission rate is 1/T = Rjk symbols/s and {/„} represents the 
sequence of symbols that results from mapping it -bit blocks into corresponding 
signal points selected from the appropriate signal space diagram. Observe that 
in PAM, the sequence {/„} is real and corresponds to the amplitude values of 
the transmitted signal, but in PSK, OAM, and combined PAM-PSK, the 
sequence {/„) is complex-valued, since the signal points have a two-dimensional 
representation. 

The autocorrelation function of v(t) is 

(f> v „{t + r: t) = 2 E[u*(tMt + 0] 

oo oo 

= 3 S S E[W m \g*{t - nT)g{t + t - mT) (4-4-4) 

n ~ -rxj ni ~ - o& 

We assume that the sequence of information symbols {/„} is wide-sense 
stationary with mean /u., and autocorrelation function 


<M«) = \E[l* n l n+m ) 


(4-4-5) 
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Hence (4-4-4) can be expressed as 

X X 

<M* + t;0= E E <j>u(m ~ n)g*(t - nT)g(t + r ~ mT) 

n=-3C WJ—-X 

X X 

= E <£„("») E g*(t - nT)g(t + T - nT - mT) (4-4-6) 

/ri — — x w =t--x 

The second summation in (4-4-6), namely, 

x 

E g*(r -nT)g(t + r- nT - mT) 

n - -* 

is periodic in the f variable with period T. Consequently, d> w (/ + r: /) is also 
periodic in the t variable with period T. That is, 

<M' + T + t; t + T) = 4> m (t + r; t) (4-4-7) 

In addition, the mean value of v(t), which is 

x 

E[v(t)\ = Hi E g(t-nT) (4-4-8) 

/? = - x 

is periodic with period T. Therefore i/(r) is a stochastic process having a 
periodic mean and autocorrelation function. Such a process is called a 
cyclostationary process or a periodically stationary process in the wide sense, as 
described in Section 2-2-6. 

In order to compute the power density spectrum of a cyclostationary 
process, the dependence of <f> m {t + t; t) on the t variable must be eliminated. 
This can be accomplished simply by averaging + r; t) over a single 
period. Thus, 

1 f Tr 2 

<fi m (r) = - <P m (t + r; t)dt 
* J -rrz 

« 30 l r™ 

= E g*U - »T)g(t + t - nT - mT) dt 

m = -sc / J-772 

06 * 1 trri-nT 

= E &i(«) X g*(0g{t + t - mT) dr (4-4-9) 

m = -oc < 1 =-* l J-T/2-nT 

We interpret the integral in (4-4-9) as the time-autocorrelation function of g(f) 
and define it as 


*«(*)=[ **(0»(* + t) dr 


(4-4-10) 
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Consequently (4-4-9) can be expressed as 

<£uv(r) - 2 4> u {m )<f> gg (T-mT) (4-4-11) 

1 , n = —-x 

The Fourier transform of the relation in (4-4-11) yields the (average) power 
density spectrum of v(() in the form 

<IU/) = ^|G(/)| 2 <M/) (4-4-12) 


where G{f) is the Fourier transform of g(t), and d>„(/) denotes the power 
density spectrum of the information sequence, defined as 


os 

4>,(/)= 2 4>M)e-* nfmT (4-4-13) 

m = oa 


The result (4-4-12) illustrates the dependence of the power density spectrum of 
v(t) on the spectral characteristics of the pulse g(f) and the information 
sequence {/„}. That is, the spectral characteristics of v(t ) can be controlled by 
design of the pulse shape g(t) and by design of the correlation characteristics of 
the information sequence. 

Whereas the dependence of on G(/) is easily understood upon 

observation of (4-4-12), the effect of the correlation properties of the 
information sequence is more subtle. First of all, we note that for an arbitrary 
autocorrelation ) the corresponding power density spectrum <t >„(/■) is 
periodic in frequency with period 1 IT. In fact, the expression (4-4-13) relating 
the spectrum <&,-,-(/) to the autocorrelation <^(m) is in the form of an 
exponential Fourier series with the {<f> u (m)} as the Fourier coefficients. As a 
consequence, the autocorrelation sequence is given by 

rl/7 T 

= <M/y“ df (4-4-14) 

J-V2T 


Second, let us consider the case in which the information symbols in the 
sequence are real and mutually uncorrelated. In this case, the autocorrelation 
function <f>„(m) can be expressed as 


<M"i) 


= I* 7 ; 


2 + y 


(m=0) 

(m#0) 


(4-4-15) 


where cr? denotes the variance of an information symbol. When (4-4-15) is 
used to substitute for 4>n{m) in (4-4-13), we obtain 


ic 

$„(/)= 07 + M? 2 e aKf, " r (4-4-16) 

in- - «i 


The summation in (4-4-16) is periodic with period 1/71 It may be viewed as 
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the exponential Fourier series of a periodic train of impulses with each impulse 
having an area 1 IT. Therefore (4-4-16) can also be expressed in the form 

<M/) = ^ + “ 2 «(/") (4-4-17) 

Substitution of (4-4-17) into (4-4-12) yields the desired result for the power 
density spectrum of v(t) when the sequence of information symbols is 
uncorrelated. That is, 

<*>»(/) -~\G(f)\ 2 + ~ £ |g(”)| s(/-~) (4-4-18) 

The expression (4-4-18) for the power density spectrum is purposely 
separated into two terms to emphasize the two different types of spectral 
components. The first term is the continuous spectrum, and its shape depends 
only on the spectral characteristic of the signal pulse g(t). The second term 
consists of discrete frequency components spaced l/T apart in frequency. Each 
spectral line has a power that is proportional to |G(/)| 2 evaluated at / = m/T. 
Note that the discrete frequency components vanish when the information 
symbols have zero mean, i.e., /u, = 0. This condition is usually desirable for the 
digital modulation techniques under consideration, and it is satisfied when 
the information symbols are equally likely and symmetrically positioned in the 
complex plane. Thus, the system designer can control the spectral characteris- 
tics of the digitally modulated signal by proper selection of the characteristics 
of the information sequence to be transmitted. 


Example 4-4-1 

To illustrate the spectral shaping resulting from g(f), consider the rectangu- 
lar pulse shown in Fig. 4-4-l(a). The Fourier transform ofg(/) is 


G(f) = AT 


sin nfT 
nfT * 


)*fT 


FIGURE 4-4-1 Rectangular pulse and its energy density spectrum |G(/)| 2 


\G\f)\- 


gtn 


ten 
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Hence 


|GW = (/tT) 2 (~^) 2 (4-4-19) 

This spectrum is illustrated in Fig. 4-4~l(f>). Note that it contains zeros at 
multiples of 1 IT in frequency and that it decays inversely as the square of 
the frequency variable. As a consequence of the spectral zeros in G(f), all 
but one of the discrete spectral components in (4-4-18) vanish. Thus, upon 
substitution for |G(/)| 2 from (4-4-19), (4-4-18) reduces to 

*M) = <rU 2T (^~~) + A Z fj,j8(f) (4-4-20) 


Example 4-4-2 

As a second illustration of the spectral shaping resulting from g(r), we 
consider the raised cosine pulse 



, 2k 
1 + cos — 
T 



O^t^T 


(4-2-21) 


This pulse is graphically illustrated in Fig. 4-4-2(a). Its Fourier transform is 
easily derived and it may be expressed in the form 


G(f) = 


AT sin jrfT 
2 JtfT(\ - f z T 2 ) 


(4-4-22) 


The square of the magnitude of G(/) is shown in Fig. 4-4-2(i>). It is 
interesting to note that the spectrum has zeros at f ~n/T, n = ±2, ±3, 
±4, .... Consequently, all the discrete spectral components in (4-4-18), ex- 
cept the ones at / — 0 and f—±\!T, vanish. When compared with the 


FIGURE 4-4-2 


Raised cosine pulse and its energy density spectrum |G(/)| J . 





ICtf>l- 
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spectrum of the rectangular pulse, the spectrum of the raised cosine 
pulse has a broader main lobe but the tails decay inversely as j*. 


Example 44-3 

To illustrate that spectral shaping can also be accomplished by operations 
performed on the input information sequence, we consider a binary 
sequence {£>„} from which we form the symbols 


(4-4-23) 

The {£„} are assumed to be uncorrelated random variables, each having zero 
mean and unit variance. Then the autocorrelation function of the sequence 
{U is 





(m = 0 ) 

(m = ±l) 
(otherwise) 


Hence, the power density spectrum of the input sequence is 


(4-4-24) 


<M/) = 2(1 + cos 27tfT) 

= 4 cos 2 xfT (4-4-25) 

and the corresponding power density spectrum for the (lowpass) modulated 
signal is 


^w(f) = j, \G(f)\ 2 cos 2 nfT (4-4-26) 


4-4-2 Power Spectra of CPFSK and CPM Signals 

In this section, we derive the power density spectrum for the class of constant 
amplitude CPM signals that were described in Section 4-3-3. We begin by 
computing the autocorrelation function and its Fourier transform, as was done 
in the case of linearly modulated signals. 

The constant amplitude CPM signal is expressed as 

s(t: I) = A cos [2 nf.t + <b(t\ I)] (4-4-27) 

where 


co 

4>{t\ I) - 2nh X I k q(t~kT) 

k = -oo 


(4-4-28) 
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Each symbol in the sequence {/„} can take one of the M values {±1, ±3, . . . , 
±(M - 1)}. These symbols are statistically independent and identically distrib- 
uted with prior probabilities 

P„ - P(l k ~ n), n = ±1, ±3, .... ±(M - 1) (4-4-29) 

where P„ = 1. The pulse g(t) = q’(t) is 2 ero outside of the interval (0, LT), 
q(t) = 0, t < 0, and q(t) = \ for t > LT. 

The autocorrelation function of the equivalent lowpass signal 

i /(0 = 

is 


^..(t+r;0=^fexp(;2^ £ I k [q{t + t - kT) - q(t - kT)]\ ] (4-4-: 

' >3 


30) 


First we express the sum in the exponent as a product of exponents. The 
result is 


+ Tf) = n exp{j2JthI k [q(r + r- kT) - q{t - kT)]}j (4-4-31) 

Next, we perform the expectation over the data symbols {/*}. Since these 
symbols are statistically independent, we obtain 


<Mf + r;r)=tn( 2 />„ e\p{j2nhn[q(t + r - kT) - q(t - kT))\ 

i 

(4-4-32) 


k - * | n= -(M~ 1) 
i odd 


Finally, the average autocorrelation function is 

1 f 7 

4>,Jt)=- 4> vv (t + r; t) dt 
T J () 


(4-4-33) 


Although (4-4-32) implies that there are an infinite number of factors in the 
product, the pulse g(t) - q'(t) = 0 for t < 0 and r > LT, and q(t) - 0 for t < 0. 
Consequently only a finite number of terms in the product have nonzero 
exponents. Thus, (4-4-32) can be simplified considerably. In addition, if we let 
f = £ + mT, where 0 £ < T and m = 0, 1, . . . , the average autocorrelation in 

(4-4-33) reduces to 


+ mT) 


J r / - 1 / M I 

= FI 2 P„exp{j2Khn[q(t + { - (k - m)T) - q(t ~ 

Z l--/. I /I - - ( V/ l) 


kT)]} 
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Let us focus on + mT) for £ + mT s* LT. In this case, (4-4-34) may be 
expressed as 

+ mT) — [<p(jh)]"' ^ A(£), m^L, 0^£<T (4-4-35) 

where *p(jh) is the characteristic function of the random sequence {/„}, de- 
fined as 


4>{jh) = E(e ,nh ‘") 


M~ 1 

- 2 

n^-(W-'l) 


n odd 


P„e> Khn 


(4-4-36) 


and A(£) is the remaining part of the average autocorrelation function, which 
may be expressed as 


A(f) 


= n ( X p„ exp {j2nhn[± - q(t - kT)]} 

LI •/<> k = \~L 1 «=-(.%/- 1 ) 


r o 


tt odd 


f M- 1 

X P„exp [j2nhnq(t + £ - kT)] \ dt, m^L (4-4-37) 

« = -(AF-l> 
n odd 


Thus, <f> m ,(r) may be separated into a product of A(£) and 4f(jh) as indicated in 
(4-4-35) for r = £ + mT ss LT and 0 =£ £ < T. This property is used below. 

The Fourier transform of 0 ,,„(t) yields the average power density spectrum 
as 


But 


<*U/) = j x iMe-M'dr 

= 2Re [Jf, <Mr)e-' 2 ^dr] 


(4-4-38) 


f 4> m (T)e- ,Ufz d t=f i vv (r)e- i2 * ft dr 
Jo Jo 

+ f i(r)P' w Vr 

■>LT 


(4-3-39) 


With the aid of (4-4-35), the integral in the range LT ^ r<* may be 
expressed as 



* /*(m+l)7‘ 


30 p 

I 

m = L J mT 


4>»A*)e i2 * fz dT 


(4-4-40) 
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Now, le! x- £ + mT. Then (4-4-40) becomes 


f 4>Ur)e- i2 * ft dt= £ f + mT)e-™*+" T ' df 

} LT »t~L J Q 

= £ f m)iHjh)r~ L e- <te 

m = L 

«? /*T 

= 2 <i'"(jft)e l2 * fnT I A (()e- i2 «"+ LT) dt (4-4-41) 

* = 0 Jo 

A property of the characteristic function is | <p(jh)\ s l. For values of /i for 
which )^(/h)j < 1, the summation in (4-4-41) converges and yields 


X <ro/oc"' w = 


i 


/t = 0 


1 - >j,(jh)e- j2 * n 

In this case, (4-4-41) reduces to 


(4-4-42) 


(4-4-43) 


By combining (4-4-38), (4-4-39), and (4-4-43), we obtain the power density 
spectrum of the CPM signal in the form 

(4-4-44) 

This is the desired result when |i/i(;7j)| < 1. In general, the power density 
spectrum js evaluated numerically from (4-4-44). The average autocorrelation 
function <f> uv (x) for the range 0=£ (L + t)7" may be computed numerically 
from (4-4-34). 

For values of h for which |iA0'h)| = 1, e.g., h - K, where K is an integer, we 
can set 


Mjh) = e' lK \ 0=£r'<l 
Then, the sum in (4-4-41) becomes 


(4-4-45) 


K e ,w ' wrfc - i ♦ Jfj . »(/ - r y) -> i y) ,4 - 4 - 46) 
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Thus, the power density spectrum now contains impulses located at frequencies 

0 =£ v < 1 , n =0,1,2,... (4-4-47) 

The result (4-4-46) can be combined with (4-4-41) and (4-4-39) to obtain the 
entire power density spectrum, which includes both a continuous spectrum 
component and a discrete spectrum component. 

Let us return to the case for which |«/r(y/*)| < 1. When the symbols are 
equally probable, i.e.. 


P n - ~ for all n 
M 


the characteristic function simplifies to the form 


n odd 


1 sin Mnh 
M sin nh 


(4-4-48) 


Note that in this case tp(jh) is real. The average autocorrelation function given 
by (4-4-34) also simplifies in this case to 


<M?) 


=— f 

2 T)„ 


T \rJT\ 


n 

k = \-L 


iWiT T - ill - «(' - * r )l d , (4 4 49 ) 

M sin 2nh[q(t + r — kT) — q{t — fcT)] ^ 


The corresponding expression for the power density spectrum reduces to 

^w(f) = 2 f <Mr) cos 2nfxdx 
LJo 

1 - ip(jh ) cos 2 nfT 


+ ■ 


1 + i/' 2 (//i) - 2ift(jh) cos 2 nfT 


i 


(F + I)7- 


4> uv (z) cos2nft dx \ (4-4-50) 


Power Density Spectrum of CPFSK A closed-form expression for the 
power density spectrum can be obtained from (4-4-50) when the pulse shape- 
g(t) is rectangular and zero outside the interval [0, T\. In this case, q(t) is 
linear for 0 =£ t T. The resulting power spectrum may be expressed as 

T [^ S S B ™(f)An(f)A m (f)\ (44-51) 
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where 


sin n[fT - {{2n - 1 - M)h] 
a[fT - \{2 n - 1 - M)h] 


BUf) = 


cos (2 nfT - a„ m ) - 0 cos a„,„ 
1 + i/i 2 - 2t/» cos 2nfT 


a„ m - nh{m + n - 1 - M) 


i l'-'f>(jh) = 


sin Mnh 
M sin nh 


(4-4-52) 


The power density spectrum of CPFSK for M = 2, 4, and 8 is plotted in 
Figs 4-4-3, 4-4-4, and 4-4-5 as a function of the normalized frequency fT, with 
the modulation index h = 2 f,T as a parameter. Note that only one-half of the 
bandwidth occupancy is shown in these graphs. The origin corresponds to the 
carrier /. The graphs illustrate that the spectrum of CPFSK is relatively 
smooth and well confined for h<l. As h approaches unity, the spectra become 
very peaked and, for h = 1 when |0| = 1, we find that impulses occur at M 
frequencies. When h > 1 the spectrum becomes much broader. In communica- 
tion systems where CPFSK is used, the modulation index is designed to 
conserve bandwidth, so that h < 1. 

The special case of binary CPFSK with h = \ (or /, = l/47’) and 0 = 0 
corresponds to MSK. In this case, the spectrum of the signal is 


4U/) = 


16A 2 T/ cos 2 KfT \ 2 
7T VI - 1 6fr 2 / 


(4-4-53) 


where the signal amplitude A = 1 in (4-4-52). In contrast the spectrum of 
four-phase offset (quadrature) PSK (OQPSK) with a rectangular pulse g(t) of 
duration T is 


<t>,.„(/) = a 2 t( 


sin xfT \ 2 
rtfT ) 


(4-4-54) 


If we compare these spectral characteristics, we should normalize the 
frequency variable by the bit rate or the bit interval T h . Since MSK is binary 
FSK, it follows that T = T h in (4-4-53). On the other hand, in OQPSK, T = 2T h 
so that (4-4-54) becomes 


<*>,,.(/) = 2A 2 7 b ( 


sin 2nfT b \ 2 
~2 nfT b / 


(4-4-55) 


The spectra of the MSK and OQPSK signals are illustrated in Fig. 4-4-6. 
Note that the main lobe of MSK is 50% wider than that for OQPSK. However, 
the side lobes in MSK fall off considerably faster. For example, if we compare the 
bandwidth W that contains 99% of the total power, we find that W = 1-2/7), for 
MSK and VP = 8/7), for OQPSK. Consequently. MSK has a narrower spectral 



Spectral density Spectral density 
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Spectral density for two level CPFSK Spectral density for two-levej CPFSK 



Spectral density for two-level CPFSK Spectral density for two-level CPFSK 




FIGURE 4-4-3 Power density spectrum of binary CPFSK. 


occupancy when viewed in terms of fractional out-of-band power above 
fT h = 1. Graphs for the fractional out-of-band power for OQPSK and MSK are 
shown in Fig. 4-4-7. Note that MSK is significantly more bandwidth-efficient 
than OPSK. This efficiency accounts for the popularity of MSK in many digital 
communications systems. 

Even greater bandwidth efficiency than MSK can be achieved by reducing 







Spectral density Spectral density 
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Spectral density tor tour-level CPFSK Speclral density for four-level CPFSK 





Normalized frequency IT 

(i) 


FIGURE 4-4-4 Power density spectrum of quaternary CPFSK. 


the modulation index. However, the FSK signals will no longer be orthogonal 
and there will be an increase in the error probability. 

Spectral Characteristics of CPM In general, the bandwidth occupancy of 
CPM depends on the choice of the modulation index h, the pulse shape g(/), 
and the number of signals M. As We have observed for CPFSK, small values of 
h result in CPM signals with relatively small bandwidth occupancy, while large 
values of h result in signals with large bandwidth occupancy. This is also the 
case for the more general CPM signals. 
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Speclral density for eight-level CPFSK 



FIGURE 4-4-5 Power density spectium of octal CPFSK 


Spectral density for eight level CPFSK 



FIGURE 4-4-6 Power density spectra of MSK and offset QPSK. [From Gronemeyer and McBride (1976)- <f> 
IEEE.] 
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FIGURE 4-4-7 


FIGURE 4-4-8 


Fractional out-of-band power (normalized 
two-sided bandwidth = 2 BT). 1 From 
Gronemeyer and McBride (1976); © 1976 
IEEE.] 



The use of smooth pulses such as raised cosine pulses of the form 

— - ( 1 - cos — ) (0=sr LT) 

2LT \ LTJ 


g(0 = 


(4-4-56) 


(otherwise) 


where L = 1 for full response and L > 1 for partial response, result in smaller 
bandwidth occupancy and, hence, greater bandwidth efficiency than the use of 
rectangular pulses. For example. Fig. 4-4-8 illustrates the power density 
spectrum for binary CPM with different partial response raised cosine (LRC) 
pulses when h - For comparison, the spectrum of binary CPFSK is also 
shown. Note that as L increases the pulse g(f) becomes smoother and the 
corresponding spectral occupancy of the signal is reduced. 


Power density spectrum for binary CPM with h = s 
and different pulse shapes. [From Atihn et al. (1981); 
©1981 IEEE] 


dB 
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FIGURE 4-4-9 


FIGURE 4-4-10 


Power density spectrum for M = 4 CPM with 3 HC and 
different modulation indices. [From Auhn et al (1981); 
© 1981 IEEE] 


dB 



The effect of varying the modulation index in a CPM signal is illustrated in 
Fig. 4-4-9 for the case of M = 4 and a raised cosine pulse of the form given in 
(4-4-56) with L- 3. Note that these spectral characteristics are similar to the 
ones illustrated previously for CPFSK, except that these spectra are narrower 
due to the use of a smoother pulse shape. 

Finally, in Fig. 4-4-10, we illustrate the fractional out-of-band power for 
two-amplitude CPFSK with several different values of h. 


Fractional out-of-band power for two-component CPFSK. (Mulligan, 1988.) 
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4-4*3 Power Spectra of Modulated Signals with Memory 

In the last two sections, we have determined the spectral characteristics for the 
class of linearly modulated signals without memory and for the class of 
angle-modulated signals such as CPFSK and CPM, which are nonlinear and 
possess memory. In this section, we consider the spectral characteristics of 
linearly modulated signals that have memory that can be modeled by a Markov 
chain. We have already encountered such signals in Section 4-3-2, where we 
described several types of baseband signals. 

The power density spectrum of a digitally modulated signal that is 
generated by a Markov chain may be derived by following the basic procedure 
given in the previous section. Thus, we can determine the autocorrelation 
function and then evaluate its Fourier transform to obtain the power density 
spectrum. For signals that are generated by a Markov chain with transition 
probability matrix P, the power density spectrum of the modulated signal may 
be expressed in the general form (see Titsworth and Welch, 1961) 

•w - h j. II 5 (' ~ ?) + y-I * 

+ | Re 2 pX*U)S}U)W) 1 (4-4-57) 

/ w - 1 j= i J 

where 5,(/) is the Fourier transform of the signal waveform s,(f), 

K 

*;(o = s,(o - 2 PkS k ( o 

k = 1 

Pij(f) is the Fourier transform of the discrete-time sequence p (/ (n), defined as 

ce 

p „(f) = 2 P,i(")e ,lAnn (4-4-58) 

n ~ 1 

and K is the number of states of the modulator. The term p, t {n) denotes the 
probability that the signal s f (t) is transmitted n signaling intervals after 
the transmission of s^t). Hence, {p, y (/t)} are the transition probabilities in the 
transition probability matrix P". Note that p, f ( 1) = p, r 

When there is no memory in the modulation method, the signal waveform 
transmitted on each signaling interval is independent of the waveforms 
transmitted in previous signaling intervals. The power density spectrum of the 
resultant signal may still be expressed in the form of (4-4-57), if the transition 
probability matrix is replaced by 


P = 


P 1 P2 ■■■ PK 
P\ P2 ■ ■ ■ PK 


(4-4-59) 


P i Pi • • • Pk 
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and we impose the condition that P" = P for all n s* 1. Under these conditions, 
the expression for the power density spectrum becomes a function of the 
stationary state probabilities {p,} only, and, hence, it reduces to the simpler 
form 


*</)-£ i 

i n - — o 




I 

tpiP,*e [. S,(f)S*(f ) ) 

i 1=1 /-i 


i : i 


(4-4-60) 


We observe that our previous result for the power density spectrum of 
memoryless linear modulation given by (4-4-18) may be viewed as a special 
case of (4-4-60) in which all waveforms are identical except for a set of scale 
factors that convey the digital information (Problem 4-30). 

We also make the observation that the first term in the expression for the 
power density spectrum given by either (4-4-57) or (4-4-60) consists of discrete 
frequency components. This line spectrum vanishes when 



(4-4-61) 


The condition (4-4-61) is usually imposed in the design of practical digital 
communications systems and is easily satisfied by an appropriate choice of 
signaling waveforms (Problem 4-31). 

Now, let us determine the power density spectrum of the baseband- 
modulated signals described in Section 4-3-2. First, the* NRZ signal is 
characterized by the two waveforms *,(?) =g(r) and s 2 (>) = -g(t), where g(t) is 
a rectangular pulse of amplitude A. For K = 2, (4-4-60) reduces to 


t 

* n -= — ■> 


g (t)| + f ) |G(/)|2 (4 - 4 ' 62) 


where 


|G(/)| 2 = MT) 2 ( 


sin nfT\ 


nfT 


) 


(4-4-63) 


Observe that when p = £, the line spectrum vanishes and d>(f) reduces to 


<*>(/) =-|G(/)| 2 


(4-4-64) 
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The NRZI signal is characterized by the transition probability matrix 


P = 


1 "I 

2 

L 

? 


(4-4-65) 


Notice that in this case F' = P for all n?l. Hence, the special form for the 
power density spectrum given by (4-4-62) applies to this modulation format as 
well. Consequently, the power density spectrum for the NRZI signal is 
identical to the spectrum of the NRZ signal. 

Delay modulation has a transition probability matrix 


0 J 
0 0 


0 r 
2 2 
0 0 
i 0_ 


(4-4-66) 


and stationary state probabilities p, = \ for r = 1, 2, 3, 4. Powers of P may be 
obtained by use of the relation 


PV = -ip 

where p is the signal correlation matrix with elements 


P„ = J- f s,{()s,(t)dt 


(4-4-67) 


(4-4-68) 


and where the four signals {s,(r), « - 1, 2,3, 4} are shown in Fig. 4-3-15. It is 
easily seen that 


"10 0 
0 1 -1 
P_ 0-1 1 
_-l 0 0 

Consequently, powers of P can be generated from the relation 

P Af4 p = -iP x p, k> 1 (4-4-70) 

Use of (4-4-66), (4-4-69), and (4-4-70) in (4-4-57) yields the power density 
spectrum of delay modulation. It may be expressed in the form 

<£(/) = . Vl _ — ^"77 [23 - 2 cos 4> -22 cos2i/> - 12cos3i// + 5cos4i// 

2i/r(17 -r 8 cos 8 ijj) 

+ 12 cos 5tfr + 2 cos 6i j/ - 8 cos 7<^ + 2 cos Si/'J (4-4-71 ) 


(4-4-69) 


where t // = nfT. 
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c 

2 


FIGURE 4-4-11 Power spectral density (one-sided) 
of Miller code (delay modulation) 
and NRZ/NRZI baseband signals. 
[ From Hecht and Guido (1969): 
©1969 IEEE.] 



The spectra of these baseband signals are illustrated in Fig. 4-4-11. Observe 
that the spectra of the NRZ and NRZI signals peak at / = 0. Delay modulation 
has a narrower spectrum and a relatively small zero-frequency content. Its 
bandwidth occupancy is significantly smaller than that of the NRZ signal. 
These two characteristics make delay modulation an attractive choice for 
channels that do not pass dc, such as magnetic recording media. 


4-5 BIBLIOGRAPHICAL NOTES AND REFERENCES 

The characteristics of signals and systems given in this chapter are very useful 
in the design of optimum modulation/demodulation and coding/decoding 
techniques for a variety of channel models. In particular, the digital modula- 
tion methods introduced in this chapter are widely used in digital communica- 
tion systems. The next chapter is concerned with optimum demodulation 
techniques for these signals and their performance in an additive, white 
gaussian noise channel. A general reference for signal characterization is the 
book by Franks (1969). 

Of particular importance in the design of digital communications systems 
are the spectral characteristics of the digitally modulated signals, which are 
presented in this chapter in some depth. Of these modulation techniques, CPM 
is one of the most important due to its efficient use of bandwidth. For this 
reason, it has been widely investigated by many researchers, and a large 
number of papers have been published in the technical literature. The most 
comprehensive treatment of CPM, including its performance and its spectral 
characteristics, can be found in the book by Anderson et al. (1986). In addition 
to this text, the tutorial paper by Sundberg (1986) presents the basic concepts 
and an overview of the performance characteristics of various CPM techniques. 
This paper also contains over 100 references to published papers on this topic. 

There are a large number of references dealing with the spectral charac- 
teristics of CPFSK and CPM. As a point of reference, we should mention that 
MSK was invented by Doelz and Heald in 1961. The early work on the power 
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spectral density of CPFSK and CPM was done by Bennett and Rice (1963), 
Anderson and Salz (1965), and Bennett and Davey (1965). The book by Lucky 
et al. (1968) also contains a treatment of the spectral characteristics of CPFSK. 
Most of the recent work is referenced in the paper by Sundberg (1986). We 
should also cite the special issue on bandwidth-efficient modulation and coding 
published by the IEEE Transactions on Communications (March 1981), which 
contains several papers on the spectral characteristics and performance of 
CPM. 

The generalization of MSK to multiple amplitudes was investigated by 
Weber el al. (1978). The combination of multiple amplitudes with general CPM 
was proposed by Mulligan (1988) who investigated its spectral characteristics 
and its error probability performance in gaussian noise with and without 
coding. 


4-1 Prove the following properties of Hilbert transforms: 
a If *(/) =jr(-f) then i(r) = -i(-r); 
b If x(t) = -x(-t) then i(r) = jf (-/): 
c If x(t) - cos a>„t then x(t) = sin «„f ; 
d If jr(r) = sin cj tl t then x (r) = -cos a 
e x (r) = -x(r): 
f f* r x 2 (t) dt = J\x 2 (t) dt\ 
g r, x(t)x{t)dt = 0. 

4-2 If jr(r) is a stationary random process with autocorrelation function <£ n (r) = 
E\x(t)x(t + r)] and spectral density d> >( (/) then show thal <£ f ,-(r) = <£,,(r), 
<M0 = ~4>.Ar), and <!>«(/) = 4>„(/). 

4-3 Suppose that n(t ) is a zero-mean stationary narrowband process represenled by 
either (4-1-37), (4-1-38), or (4-1-39). The autocorrelation function of the equiv- 
alent lowpass process z(t) =x(t) +jv(t) is defined as 

<*>,-( r) = (£|z*(/)z(f + r)] 

a Show that 

£[z(/)z(r + r)J - 0 
b Suppose <£..( t) = A( ( 5(r), and let 



Determine E(V 2 ) and E(VV*) = £(|VT’). 

4-4 Determine the autocorrelation function of the stochastic process 

,r(f) = A sin (2 lift + 6) 

where / is a constant and 0 is a uniformly distributed phase, i.e., 

p(6)=^~, O«0=s2;r 

2 n 

4-5 Prove that s,(r) is generally a complex-valued signal and give the condition under 
which it is real. Assume that s(f) is a real-valued bandpass signal. 
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4-6 Suppose that s(r) is either a real- or complex-valued signal that is represented as a 
linear combination of orthonormal functions ie* 


where 


Hi ) = 2 i */*(0 



(m ^ n) 
(m = n) 


Determine the expressions for the coefficients {s t } in the expansion £(/) that 
minimize the energy 

% = \ |j(f)-i(r)f<* 


and the corresponding residual error %. 

4-7 Suppose that a set of M signal waveforms {s, m (r)} are complex-valued. Derive the 
equations for the Gram-Schmidt procedure that will result in a set of yy M 
orthonormal signal waveforms. 

4-8 Determine the correlation coefficients p km among the four signal waveforms {s,(/)} 
shown in Fig. 4-2-1 , and the corresponding Euclidean distances. 

4-9 Consider a set of M orthogonal signal waveforms j m (r), 1 m ^ M, 0 ^ t ^ T, all 
of which have the same energy & Define a new set of \f waveforms as 

1 M 
M 

Show that the M signal waveforms {j^,(r)} have equal energy, given by 

r = (M - 1)91 M 

and are equally correlated, with correlation coefficient 

Pmn = d,= ~ M - 1 

4-10 Consider the three waveforms f„(t) shown in Fig. P4-10. 
a Show that these waveforms are orthonormal. 

b Express the waveform x(/) as a weighted linear combination of f n (t), n - 1, 2, 3 ( 
if 


x(t) = 


-1 
• 1 
.-1 


(0«f <1) 
(3*£f <4) 


and determine the weighting coefficients. 

4-11 Consider the four waveforms shown in Fig. P4-11. 

a Determine the dimensionality of the waveforms and a set of basis functions, 
b Use the basis functions to represent the four waveforms by vectors s„ Sj, s„ 
and s 4 . 

c Determine the minimum distance between any pair of vectors. 

4-12 Determine a set of orthonormal functions for the four signals shown in Fig. P4-12. 
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4-13 A lowpass gaussian stochastic process x(t) has a power spectra! density 



(l/l<S) 

(!/!>«) 


Determine the power spectral density and the autocorrelation function of 
y(t) — jc 2 (t). 

4-14 Consider an equivalent lowpass digitally modulated signal of the form 
u(r) = 2 [a„g(t ~ 2 nT) - jb„g(t - 2nT - T)] 


where {a r } and {£>„} are two sequences of statistically independent binary digits and 
g(r) is a sinusoidal pulse defined as 



sin (m/2T) 
0 


{0<t<2r) 

(otherwise) 


This type of signal is viewed as a four-phase PSK signal in which the pulse shape is 
one-half cycle of a sinusoid. Each of the information sequences {a„} and {£>„} is 
transmitted at a rate of 1/2T bits/s and, hence, the combined transmission rate is 
1/f bits/s. The two sequences are staggered in time by T seconds in transmission 
Consequently, the signal u(r) is called staggered four-phase PSK. 
a Show that the envelope |«(?)| is a constant, independent of the information a„ on 
the in-phase component and information b„ on the quadrature component. In 
other words, the amplitude of the carrier used in transmitting the signal is 
constant. 

b Determine the power density spectrum of u(f). 

c Compare the power density spectrum obtained from (b) with the power density 
spectrum of the MSK signal. What conclusion can you draw from this 
comparison? 

4-15 Consider a four-phase PSK signal represented by the equivalent lowpass signal 

“(0 = 2/„g('-«r) 


where /„ takes on one of the four possible values VJ(±l ±j) with equal 
probability. The sequence of information symbols {/„} is statistically independent, 
a Determine and sketch the power density spectrum of u(t ) when 



(O'S/aST) 

(otherwise) 


b Repeat (a) when 



A sin (ttt/T) 

0 


(0=sr*£ T) 
(otherwise) 


c Compare the spectra obtained in (a) and (b) in terms of the 3 dB bandwidth and 
the bandwidth to the first spectral zero. 
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FIGURE P4-I8 
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4*16 The random process v(t) is defined as 

u(r) = X cos 2nft - Y sin 2nfj 

where X and Y are random variables. Show that v(/) is wide-sense stationary if 
and only if E{X) = E(Y) = 0. E(X 2 ) = E{Y 2 ), and £(*>0=0. 

4-17 Carry out the Gram-Schmidt orthogonalization of the signals in Fig. 4-2- 1(a) in 
the order s 4 (r), s,(0> -M<), and, thus, obtain a set of orthonormal functions {/„(/)}. 
Then, determine the vector representation of the signals {$„(/)} by using the 
orthonormal function^ {[„ ,(/)}. Also, determine the signal energies. 

4-18 Determine the signal space representation of the four signals j<(/), k- 1, 2, 3, 4, 
shown in Fig. P4-18, by using as basis functions the orthonormal functions /,(/) and 
f 2 (t). Plot the signal space diagram and show that this signal set is equivalent to 
that for a four-phase PSK signal. 

4-19 The power density spectrum of the cyclostationary process 


i>(')= 2 Lg(t-nT) 

n '• 


was derived in Section 4-4-1 by averaging the autocorrelation function <£„,.(/ + r, t) 
over the period T of the process and then evaluating the Fourier transform of the 
average autocorrelation function. An alternative approach is to change the 
cyclostationary process into a stationary process v A (r) by adding a random variable 
A, uniformly distributed over 0’S A < T, so that 


vi[t)= 2 l„g(t-nT~ A) 

n f- 

and defining the spectral density of v(t)- as the Fourier transform of the 
autocorrelation function of the stationary process u A (r). Derive the result in 
(4-4-11). by evaluating the autocorrelation function of i/ A (r) and its Fourier 
transform. 
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4-20 A PAM partial response signal (PRS) is generated as shown in Fig. P4-20 by 
exciting an ideal lowpass filter of bandwidth W by the sequence 

B n = I„ + 7„., 

at a rate 1/7" = 2 W symbols/s. The sequence {/„} consists of binary digits selected 
independently from (he alphabet { ! , — 1 } with equal probability. Hence, the filtered 
signal has the form 


”(')= 2^B„g(i-nT). T — ", 

a Sketch the signal space diagram for t/(;) and determine the probability of 
occurrence of each symbol. 

b Determine the autocorrelation and power density spectrum of the three-level 
sequence 

c The signal points of the sequence {B„} form a Markov chain. Sketch this Markov 
chain and indicate the transition probabilities among (he states. 

4-21 The lowpass equivalent representation of a PAM signal is 

«(') = E Lg(t -nT) 


Suppose g(t) is a rectangular pulse and 

/„ =a„ - a „, 2 

where {a,,} is a sequence of uncorrelated binary-valued (1. —1) random variables 
that occur with equal probability. 

a Determine the autocorrelation function of the sequence {/„} 
b Determine the power density spectrum of u(i). 
c Repeat (b) if the possible values of the a„ are (0, 1 ). 

4-22 Show that x(t) = s(r) cos 2jrf, ±5{r)sin 2itfj is a single-sideband signal, where .v(r) 
is band-limited to B Hz and s(t) is its Hilbert transform. 
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4-23 Use the results in Section 4-4-3 to determine the power density spectrum of the 
binary FSK signals in which the waveforms are 

s,(() = sin u>,t, i- 1,2, 0 

where (v i =nk!T and a> 2 ~mnlT, n^m, and m and n are arbitrary positive 
integers. Assume that p, =P 2 — \ ■ Sketch the spectrum and compare this result 
with the spectrum of the MSK signal. 

4-24 Use the results in Section 4-4-3 to determine the power density spectrum of 
multitone FSK (MFSK) signals for which the signal waveforms are 

2 nnt 

s„(i) = sin —jr , n = \,2 M, 0«/«7 


Assume that the probabilities p, = MM for all i. Sketch the power spectral density. 
4-25 A quadrature partial response signal (QPRS) is generated by two separate partial 
response signals of the type described in Problem 4-20 placed in phase quadrature. 
Hence, the QPRS is represented as 

s(;) = Re [vfty 2 ’"'' ] 

where 

v(t) = v 1 (t)+jv y (t) 

~ 2 B„u(t - nT) + y 2 C„u(t - nT) 


and B„ =/„ , and C„ =J„ +J„ ,. The sequences [B„] and {C,,} are uncorre- 

lated and /„ = ±1. J„ = ±1 with equal probability. 

a Sketch the signal space diagram for the QPRS signal and determine the 
probability of occurrence of each symbol, 
b Determine the autocorrelations and power spectra density of v„(t), and 
t(r). 

c Sketch the Markov chain model and indicate the transition probabilities for the 
QPRS. 

4-26 Determine the autocorrelation functions for the MSK and offset QPSK modulated 
signals based on the assumption that the information sequences for each of the 
two signals are uncorrelated and zero-mean. 

4-27 Sketch the phase tree, the state trellis, and the state diagram for partial response 
CPM with h = ; and 


«(0 = 


fi/4r 

to 


(0sr«2T) 

(otherwise) 


4-28 Determine the number of terminal phase states in the state trellis diagram for 
a a full response binary CPFSK with either h = % or ij: 
b a partial response L = 3 binary CPFSK with either h = $ or i 
4-29 Show that 16 QAM can be represented as a superposition of two four-phase 
constant envelope signals where each component is amplified separately before 
summing, i.e. 


r(/) - G[A„ cos 2 xfi + B„ sin 2n p] + [C„ cos 2i xft + D„ sin 2itfj\ 
where {-4,,}. {C„}, and {A,} are statistically independent binary sequences 
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with elements from the set { + ], -]} and G is the amplifier gain. Thus, show that 
the resulting signal is equivalent to 

s{t) = /„ cos 2n f t + Q n sin 2nfj 

and determine /„ and Q„ in terms of A „ , B„, C„, and D„. 

4-30 Use the result in (4-4-60) to derive the expression for the power density spectrum 
of memory less linear modulation given by (4-4-18) under the condition that 

s k (t) = hs(r), k =1,2 K 

where l k is one of the K possible transmitted symbols that occur with equal 
probability. 

4-31 Show that a sufficient condition for the absence of the line spectrum component in 
(4-4-60) is 

K 

SmW = o 

i I 

Is this condition necessary? Justify your answer. 

4-32 The information sequence {«„},). r is a sequence of iid random variables, each 
taking values +1 and -1 with equal probability. This sequence is to be transmitted 
at baseband by a biphase coding scheme, described by 

where g(r) is shown in Fig. P4-32. 
a Find the power spectral density of s(f ). 

b Assume that it is desirable to have a zero in the power spectrum al f = 1 IT To 
this end, we use a precoding scheme by introducing b„ = a„ + ka n ,, where k is 
some constant, and then transmit the {6,,} sequence using the same g(r). Is it 
possible to choose k to produce a frequency null at / = 1 / 7 "? If yes, what are the 
appropriate value and the resulting power spectrum? 
c Now assume we want to have zeros al all multiples of f„ = 1/4T. Is it possible to 
have these zeros with an appropriate choice of k in the previous part? If not 
then what kind of precoding do you suggest to result in the desired nulls? 

4-33 Starting with the definition of the transition probability matrix for delay 
modulation given in (4-4-66). demonstrate that the relation 

P>= -3P 

holds, and. hence, 

p‘ *p = -ip*p, k^l 


s<n 

i 


i r 


r t 


FIGURE P4-32 
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4-34 The two signal waveforms for binary FSK signal transmission with discontinuous 
phase are 

i o (0 = cos [2 n(f - y + 0„ j, 0 t < T 

*‘ ( ' )= Vl^os + f)' + e ']’ °^ t ^ T 


where A/ = l/T « f c , and 9 0 and 9, are uniformly distributed random variables on 
the interval (0, In). The signals r„(/) and r,(/) are equally probable. 
a Determine the power spectral density of the FSK signal, 
b Show that the power spectral density decays as 1/f 2 for / »/. 



OPTIMUM RECEIVERS FOR 
THE ADDITIVE WHITE 
GAUSSIAN NOISE 
CHANNEL 


In Chapter 4, we described various types of modulation methods that may be 
.used to transmit digital information through a communication channel. As we 
have observed, the modulator at the transmitter performs the function of 
mapping the digital sequence into signal waveforms. 

This chapter deals with the design and performance characteristics of 
optimum receivers for the various modulation methods, when the channel 
corrupts the transmitted signal by the addition of gaussian noise. In Section 
5-1, we first treat memoryless modulation signals, followed by modulation 
signals with memory. We evaluate the probability of error of the various 
modulation methods in Section 5-2. We treat the optimum receiver for CPM 
signals and its performance in Section 5-3. In Section 5-4, we derive the 
optimum receiver when the carrier phase of the signals is unknown at the 
receiver and is treated as a random variable. Finally, in Section 5-5, we 
consider the use of regenerative repeaters in signal transmission and carry out 
a link budget analysis for radio channels. 


5-1 OPTIMUM RECEIVER FOR SIGNALS 
CORRUPTED BY ADDITIVE WHITE 
GAUSSIAN NOISE 

Let us begin by developing a mathematical model for the signal at the input to 
the receiver. We assume that the transmitter sends digital information by use 
of M signal waveforms {s m (r), m = 1,2, , M}. Each waveform is transmitted 
within the symbol (signaling) interval of duration 77 To be specific, we consider 
the transmission of information over the interval 
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Trasmitted 
signal — 

5JM 


Channel 

-o- 


^ Received 
r(i) = iji\ * nil) 


FIGURE 5*1-1 Model for received signal passed through an 
AWGN channel. 


Noise 

h(/I 


The channel is assumed to corrupt the signal by the addition of white 
gaussian noise, as illustrated in Fig. 5-1-1. Thus, the received signal in the 
interval 0 *£ t T may be expressed as 

r(0 = s,„(0 + «(0. 0 *£ f *= T (5-1-1) 

where n(t) denotes a sample function of the additive white gaussian noise 
(AWGN) process with power spectral density <$>„„(/) = {N l} W/Hz. Based on 
the observation of r(t) over the,signal interval, we wish to design a receiver 
that is optimum in the sense (hat it minimizes the probability of making an 
error. 

It is convenient to subdivide the receiver into two parts — the signal 
demodulator and the detector — as shown in Fig. 5-1-2. The function of the 
signal demodulator is to convert the received waveform r(/) into an N- 
dimensional vector r = [r, r 2 . . . r N \, where N is the dimension of the 
transmitted signal waveforms. The function of the detector is to decide which 
of the M possible signal waveforms was transmitted based of the vector r. 

Two realizations of the signal demodulator are described in the next two 
sections. One is based on the use of signal correlators. The second is based on 
the use of matched filters. The optimum detector that follows the signal 
demodulator is designed to minimize the probability of error. 

5-1-1 Correlation Demodulator 

In this section, we describe a correlation demodulator that decomposes the 
received signal and the noise into Af-dimensional vectors. In other 1 words, the 
signal and the noise are expanded into a series of linearly weighted 
orthonormal basis functions {/,(!)}■ It is assumed that the N basis functions 
{/«( 0} span the signal space, so that every one of the possible transmitted 


FIGURE 5-1-2 Receiver configuration. 



Output 

decision 
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signals of the set {s„,(t), 1 ^ A/} can be represented as a weighted linear 

combination of {/„(/)}• In the case of the noise, the functions {/„(f)} do not span 
the noise space. However, we show below that the noise terms that fall outside 
the signal space are irrelevant to the detection of the signal. 

Suppose the received signal r(t) is passed through a parallel bank of N 
crosscorrelators which basically compute the projection of r(f ) onto the N basis 
functions as illustrated in Fig. 5-1-3. Thus, we have 

f r(t)f k (t) dt = f [s m (t) + n{t)\f k {t)dt . 

Jo Jo p-l-Z) 

r k = Smk + n k , k = 1, 2, . . . , N 

where 


S mk ~ J 

f s„,(t)f k (t) dt, 
0 

r t 

k -1,2,. 

..,N 

(5-1-3) 

"*“J 

n(t)fk{t)dt r 

0 

cS 

II 

-,N 



The signal is now represented by the vector s m with components s mk , 
k = 1 , 2, . . . , ;V. Their values depend on which of the M signals was trans- 
mitted. The components { n k } are random variables that arise from the presence 
of the additive noise. 

In fact, we can express the received signal r(t) in the interval 0 ^ t T as 


N N 

r U) = 2 W/*(0 + 2 «*/*(') + n'(t) 


*= 1 


k = 1 


= 2 


'*/*(') + "’(0 


The term n'(t), defined as 

n '(0 ~ n (t) — S n k f k (t) 

k=l 


(5-1-4) 


(5-1-5) 


is a zero-mean gaussian noise process that represents the difference between 
the original noise process n(i ) and the part corresponding to the projection of 
n(f) onto the basis functions {/*(/)}. We shall show below that n'(t) is 
irrelevant to the decision as to which signal was transmitted. Consequently, the 
decision may be based entirely on the correlator output signal and noise 
components r k - s mk + n k , k = 1, 2, . . . , N. 

Since the signals {s m (r)} are deterministic, the signal components are 
deterministic. The noise components {«*} are gaussian. Their mean values are 

£(«*) = f E[n(t)]f k (t) dt = 0 

J() 


(5-1-6) 
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FIGURE 5-1-3 


Correia tion-type demodulator. 





Sample 
at i = T 


To detector 


for all n. Their covariances are 

E(n k n m )=f f E[n(t)n(T)}f k (t)f m (r) dt dr 
Jo J 0 

= 2^0 f f S(t - r)f k (t)f m (t) dt dr 
Jo Jo 



= iK&mk (5-1-7) 

where S mk = 1 when m = k and zero otherwise. Therefore, the N noise 
components {«*} are zero-mean uncorrelated gaussian random variables with a 
common variance al = \N 0 . 

From the above development, it follows that the correlator outputs {r k } 
conditioned on the mth signal being transmitted are gaussian random variables 
with mean 

F(r* ) = F (s mi + n k ) = s mk (5-1 -8) 

and equal variance 

= o-l = ^A’o (5-1-9) 

Since the noise components {«*} are uncorrelated gaussian random variables, 
they are also statistically independent. As a consequence, the correlator 
outputs {r*} conditioned on the mth signal being transmitted are statistically 
independent gaussian variables. Hence, the conditional probability density 
functions of the random variables [r, r 2 • • • r N \ = r are simply 

N 

P( r I s m ) = El P(r k | s mk ), m = 1, 2 , . . . , M 
*= 1 


(5-MO) 
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whfere 

Pih | s mk ) = -f== exp [ - — ], Ac = 1,2 N (5-1-11) 

By substituting (5-1-11) into (5-1-10), we obtain the joint conditional pdfs 

P(r ] s„) = " \^ ex P [ ~ X — ], m = 1, 2, . . . , M (5-1-12) 
t «/»(>! l * = | rv () j 

As a final point we wish to show that the correlator outputs (r, ,r 2 , , r N ) 
are sufficient statistics for reaching a decision on which of the M signals was 
transmitted, i.e., that no additional relevant information can be extracted from 
the remaining noise process n'(t). Indeed, n'(r ) is uncorrelated with the N 
correlator outputs (r*}, i.e., 

E[n'{t)r k ] = £[n'(0K* + £[«'(/)**) 

= E[n\t)n k ] 

= f{[^(/)- 2 «/#')]»*} 

= [ E[n(t)n(T)]f k (r)dT- X £(»/«* M/(0 

j=\ 

= IKMO - iKAiO = 0 (5-1-13) 

Since n'(t) and {r*} are gaussian and uncorrelated, they are also statistically 
independent. Consequently, n'(f) does not contain any information that is 
relevant to the decision as to which signal waveform was transmitted. All the 
relevant information is contained in the correlator outputs {r*}. Hence, n'(f) 
may be ignored. 


Example 5-1-1 

Consider an A/-ary baseband PAM signal set in which the basic pulse shape 
g(t) is rectangular as shown in Fig. 5-1-4. The additive noise is a zero-mean 
white gaussian noise process. Let us determine the basis function f(t) and 
the output of the correlation-type demodulator. The energy in the rectangu- 
lar pulse is 

£* = f g 2 {t)dt=i a 2 dt = a 2 T 

■T) Jo 


gin 

a 


o r 


FIGURE 5-1-4 Signal pulse for Example 5*1-1. 


r 
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Since the PAM signal set has dimension -N = 1, there is only one basis 
function f(t). This is given as 


/(0 = 


1 




'fa*T 

= jl iVf 

(otherwise) 


10 


The output of the correlation-type demodulator is 


It is interesting to note that the correlator becomes a simple integrator when 
/ (/) is rectangular. If we substitute for r(t), we obtain 


r = : 7 f{Io ( f ) + «(')]} * 

“^[/o s ~< ,>d,+ l »<H 


r=*s m + n 

■ where the noise term E(n) = 0 and 


= / n ( { ) n ( r ) dt dr \ 

= rf- f £ [»(On(r)]dtdT 
/ Jo J o 

= ¥rl i s ('~ T ) dtdT = 2% 


The probability density function for the sampled output is 


p(r I Jm) = 




exp 


(r ~ s m ) 

N, 


£-):l 

'o J 


5-1-2 Matched-Filter Demodulator 

Instead of using a bank of N correlators to generate the variables {r k }, we may 
use a bank of N linear filters. To be specific, let us suppose that the impulse 
responses of the N filters are 


h k (t)=f k (T-t), O^t^T 


(5-1-14) 



' HAITI R 5 OPTIMt VI Kill IV I KN FOR MIL ADDI1IVE WHITE GADSS1AN NOISE CHANNEL 239 


FIGURE 5-1-5 Signal jc(/) and filter matched to s(i). 



(a) Signal .v( r ) (b> Impulse response 

of filter matched to .*(/) 


where {/*(/)} are the N basis functions and /i*(r) = 0 outside of the interval 
0 f =£ 7. The outputs of these filters are 

>’*(0 = f r(z)h k (t - z)dz 

= f r(x)f k {T-t + t)dz, £ = 1,2 N (5-1-15) 

J\\ 

Now. if we sample the outputs of the filters at t = T, we obtain 

y k (T)=\ r(z)f k (z) dz ~ r k , £ = 1.2,..., N (5-1-16) 

h) 

Hence, the sampled outputs of the filters at time : = T are exactly the set of 
values {/*} obtained from the N linear correlators. 

A filter who^e impulse response h(t) = s(T - r), where s(/) is assumed to be 
confined to the time interval 0 =£ i T, is called the matched filter to the signal 
s(/t. An example of a signal and its matched filter are shown in Fig. 5-1-5. The 
response of h{t) = s(T - t ) to the signal 5 (f) is 

y(t) = f s(t)s(T ~ t + z)dz (5-1-17) 

Jo 

which is basically the time-autocorrelation function of the signal s{t). Figure 
5-1-6 illustrates y(t) for the triangular signal pulse shown in Fig. 5-1-5. Note 
that the autocorrelation function y{t) is an even function of f, which attains a 
peak at f = T. 

In the case of the demodulator described above, the N matched filters are 


FIGURE 5-1-6 The matched filter output is the autocorrelation function of r(r). 
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FIGURE 5-1-7 



Sample 

Matched filter demodulator. at t = T 


matched to the basis functions {/*(r)}- Figure 5-1-7 illustrates the matched filter 
demodulator that generates the observed variables {r*}. 


Properties of, the Matched Filter A matched filter has some interesting 
properties. Let us prove the most important property, which may be stated as 
follows: If a signal s(t) is corrupted by AWGN, the filter with impulse response 
matched to s(t) maximizes the output signal-to-noise ratio (SNR). 

To prove this property, let us assume that the received signal r(t) consists of 
the signal s(r) and AWGN n(t) which has zero-mean and power spectral 
density d> w (/) = \N 0 W/Hz. Suppose the signal r(t) is passed through a filter 
with impulse response h(t), 0 « K 7, and its output is sampled at time t — T. 
The filter response to the signal and noise components is 

y(0 = [ r{r)h(t - *) dr 

= f s(r)h(t - t) dr+ [ n(r)h(t - r) dr (5-1-18) 

•>0 Jo 


At the sampling instant t = T, the signal and noise components are 
y(T) = f. s(r)h(T - r) dr + f «(r)/i(r - r) dr 

= y,(T) + yAT) (5-1-19) 

where y s (T) represents the signal component and y„(T) the noise component. 
The problem is to select the filter impulse response that maximizes the output 
signal-to-noise ratio (SNRo) defined as 


SNRo = ' 


y 2 s(T) 
E[yl(T )] 


(5-1-20) 
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The denominator in (5-1-20) is simply the variance of the noise term at the 
output of the filter. Let us evaluate £[ v, 2 ,(T)]. We have 

E [yl(T)}= \ [ E[n(z)n(t)]h(T - z)h(T - t)dtdz 
A) Jd 

= 2^0 [ f S(t-x)h(T-x)h(T-t)dtdx 
A) A) 

= [ h 2 (T-t)dt (5-1-21) 

A) 


Note that the variance depends on the power spectral density of the noise and 
the energy in the impulse response h(t). 

By substituting for y,(T) and E[y 2 „(T ) ] into (5-1-20), we obtain the 
expression for the output SNR as 


SN d [iZs(x)h{T-x)dx\ 2 [S£h(T)s(T- x)dz] 2 
’ ^Vo SZh 2 (T-t)dt \N«iZh\T-t)dt 


(5-1-22) 


Since the denominator of the SNR depends on the energy in h(t), the 
maximum output SNR over h{t) is obtained by maximizing the numerator 
subject to the constraint that the demoninator is held constant. The maximiza- 
tion of the numerator is most easily performed by use of the Cauchy-Schwarz 
inequality, which states, in general, that if g,(/) and g 2 (t) are finite-energy 
signals then 


I 


g\{*)gi{t)dt 


f 


g 1(0 dt g 2 2 (t) dr 


(5-1-23) 


with equality when g { (t) -Cg 2 (t) for any arbitrary constant C. If we set 
g\(0 — h(t) and g 2 (r) = s(T - t), it is clear that the SNR is maximized when 
h{r) = Cs(T - /), i.e., h(r) is matched to the signal s(t). The scale factor C 2 
drops out of the expression for the SNR, since it appears in both the 
numerator and the denominator. 

The output (maximum) SNR obtained with the matched filter is 


SNR () = ~ [ s 2 {r) dt 

No A) 

= 2W, (5-1-24) 

Note that the output SNR from the matched filter depends on the energy of 
the waveform s(t) but not on the detailed characteristics of s(t). This is another 
interesting property of the matched filter. 


Frequency-Domain Interpretation of the Matched Filter The matched 
filter has an interesting frequency-domain interpretation. Since h(r) = s(T - i). 
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the Fourier transform of this relationship is 

H(f)= f s(T - t)e i2nfr dt 

= | j $(r)e' 2 ' r/T dTje~' 2,t/T 

= S*(f)e-' 2 * fT (5-1-25) 

We observe that the matched filter has a frequency response that is the 
complex conjugate of the transmitted signal spectrum multiplied by the phase 
factor e~ ,2KfT , which represents the sampling delay of T. In other words, 
|//(/)| = |S(/)|, so that the magnitude response of the matched filter is identical 
to the transmitted signal spectrum. On the other hand, the phase of //(/) is the 
negative of the phase of S(f). 

Now, if the signal s(t) with spectrum S(/) is passed through the matched 
filter, the filter output has a spectrum Y(/) = |S(/)| 2 e~ ,2nfr . Hence, the output 
waveform is 


y*(0 = f Y(fy 2 * ft df 

J —re. 

= f \S(f)\ 2 e^ j2xJT e j2xfi df 

* — 3C 


(5-1-26) 


By sampling the output of the matched filter at t = T, we obtain 

y,(T) - f |5(/)| 2 df = f T s\t) dt = % (5-1-27) 

J-re JO 

where the last step follows from Parseval’s relation. 

The noise at the output of the matched filter has a power spectral density 

<*><>(/) = I W)| 2 JVo (5-1-28) 

Hence, the total noise power at the output of the matched filter is 

P«=\ W)df 

J — re 

= hN e f |H(/)| 2 df = \N 0 f |5(/)| 2 df = m 0 (5-1-29) 

J — 3c J —re 


The output SNR is simply the ratio of the signal power P„ given by 


Ps 


-yKT) 


(5-1-30) 



CHAPTER 5: OPTIMUM RECEIVERS FOR THE ADDITIVE WHITE GAUSSIAN NOISE CHANNEL 243 


FIGURE 5-1-8 
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Basis functions and matched filter responses for Example 5-1-2. 


to the noise power P„. Hence, 


SNRu = 


Ps 


g 2 2 g 
\%N 0 ~ N 0 


which agrees with the result given by (5-1-24). 


(5-1-31) 


Example 5-1-2 


Consider the M = 4 biorthogonal signals shown in Fig. 5-1-8 for transmitting 
information over an AWGN channel. The noise is assumed to have zero 
mean and power spectral density ^ N 0 . Let us determine the basis functions 
for this signal set, the impulse responses of the matched-filter demodulators, 
and the output waveforms of the matched-filter demodulators when the 
transmitted signal is s,(r). 

The M = 4 biorthogonal signals have dimension N = 2. Hence, two basis 
functions are needed to represent the signals. From Fig. 5-1-8, we choose 
/i(0 and/ 2 (f) as 



(0«/«|7*) 

(otherwise) 

(otherwise) 


(5-1-32) 
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These waveforms are illustrated in Fig. 5-l-8(<v). The impulse responses of 
the two matched filters are 


M0=/.(T-0 = { 
ti2 (r)=/ 2 (r-0»{ 


V2 Jf (t7=£r=sT) 

0 (otherwise) 

VlfT (0 ^i^kT) 

0 (otherwise) 


(5-1-33) 


and are iilustraled in Fig. 5-1 -8(F). 

If Si({) is transmitted, the (noise-free) responses of the two matched 
filters are as shown in Fig. 5-l- 8(c). Since y,(t) and y 2 (r) are sampled at 
t = T, we observe that y u (T) = V ±A : T and y 2v (T) = 0. Note that \A 2 T = if, 
the signal energy. Hence, the received vector formed from the two matched 
filter outputs at the sampling instant t ~T is 

r=[r, r 2 ) = (Vl+n 1 «,] (5-1-34) 

where n, = _Vi„(T) and n 2 ~y 2 „(T) are the noise components at the outputs 
of the matched filters, given by 

y*„(7)^[ n(t)f k (t) dt. k = \, 2 (5-1-35) 

A i 


Clearly, E{n k ) = E[y k „(T)\ = 0. Their variance is 

a;, = £(>>*„( O] = f [ E[n(t)n{x)\f k {t)f k (T)iitdt 
A\ A) 

= 2 *o f f 6(r - r)f k (T)f k (r) dutr 

A) Ai 

= 2*0 \ fl(t) dt = |* () (5- 1 -36) 

A) 


Observe that the SNR,, for the first matched filter is 


SNR 0 = 


(Vj) 2 

2*0 


*o 


(5-1-37) 


which agrees with our previous result. Also note that the four possible 
outputs of the two matched filters, corresponding to the four possible 
transmitted signals in Fig. 5-1-8 are (r„ r 2 ) = (V% + n,, n 2 ), (n, , V?+ «,). 
( - Vi + n , , n 2 ) and (« , , - Vi + «,). 


5-1-3 The Optimum Detector 

We have demonstrated that, for a signal transmitted over an AWGN channel, 
either a correlation demodulator or a matched filter demodulator produces the 
vector r = [r, r 2 - • • r N \, which contains all the relevant information in the 
received signal waveform. In this section, we describe the optimum decision 
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rule based on the observation vector r. For this development, we assume that 
there is no memory in signals transmitted in successive signal intervals. 

We wish to design a signal detector that makes a decision on the transmitted 
signal in each signal interval based on the observation of the vector r in each 
interval such that the probability of a correct decision is maximized. With this 
goal in mind, we consider a decision rule based on the computation of the 
posterior probabilities defined as 


P(signal s,„ was transmitted | r), m= 1,2 M 

which we abbreviate as P( s„, | r). The decision criterion is based on selecting 
the signal corresponding to the maximum of the set of posterior probabilities 
{P(s,„ j r)}- Later, we show that this criterion maximizes the probability of a 
correct decision and, hence, minimizes the probability of error. This decision 
criterion is called the maximum a posteriori probability (MAP) criterion. 

Using Bayes' rule, the posterior probabilities may be expressed as 


^(s,„ ( r) 


p(r |s„,)P(s,„) 
P( r) 


(5-1-38) 


where p(r|s,„) is the conditional pdf of the observed vector given s,„, and 
P($,„) is the a priori probability of the /nth signal being transmitted. The 
denominator of (5-1-38) may be expressed as 


M 


P(r)= S P( T | s „i)P(s,„) 

m = i 


(5-1-39) 


From (5-1-38) and (5-1-39), we observe that the computation of the posterior 
probabilities P(s„, | r) requires knowledge of the a priori probabilities P( s,„) 
and the conditional pdfs p { r | s,„) for m = 1, 2 , M. 

Some simplification occurs in the MAP criterion when the M signals are 
equally probable a priori, i.e., P( s„,) = 1 /M for all M. Furthermore, we note 
that the denominator in (5-1-38) is independent of which signal is transmitted. 
Consequently, the decision rule based on finding the signal that maximizes 
P(Sm | r) is equivalent to finding the signal that maximizes p( r| s,„). 

The conditional pdf p(r \ s ,„ ) or any monotonic function of it is usually 
called the likelihood function. The decision criterion based on the maximum of 
p(r | s,„) over the M signals is called the maximum-likelihood (ML) criterion. 
We observe that a detector based on the MAP criterion and one that is based 
on the ML criterion make the same decisions as long as the a priori 
probabilities P( s,„) are all equal, i.e., the signals {s,„} are equiprobable. 

In the case of an AWGN channel, the likelihood function p { r | s,„) is given 
by (5-1-12). To simplify the computations, we may work with the natural 
logarithm of p { r | s,„), which is a monotonic function. Thus. 
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The maximum of lnp(r | s,„) over s m is equivalent to finding the signal s m that 
minimizes the Euclidean distance 


D{T,s„,)=jt(r k -s mk ) 2 (5-1-41) 

k = I 

We call Z)(r, s,„), m ~ 1,2 M, the distance metrics. Hence, for the AWGN 

channel, the decision rule based on the ML criterion reduces to finding the 
signal s m that is closest in distance to the received signal vector r. We shall 
refer to this decision rule as minimum distance detection. 

Another interpretation of the optimum decision rule based on the ML 
criterion is obtained by expanding the distance metrics in (5-1-41) as 

N N ,\ 

D(r, s m ) = 2 r l ~ 2 £ r„s mr + £ s 2 „ in 

n ~ 1 n — I n = I 

= |r| 2 - 2r • s„, + |s m | 2 , m = l,2 M (5-1-42) 

The term |r | 2 is common to all decision metrics, and, hence, it may be ignored 
in the computations of the metrics. The result is a set of modified distance 
metrics 

D'(r,s,„) = -2r«s„, + |s,„| 2 (5-1-43) 

Note that selecting the signal s,„ that minimizes D'(r,$„,) is equivalent to 
selecting the signal that maximizes the metric C(r, s„,)= -D'(r,s,„), i.e., 

C(r, s„,) = 2r • s m - |s„,|‘ (5-1-44) 

The term r * s,„ represents the projection of the received signal vector onto 
each of the M possible transmitted signal vectors. The value of each of these 
projections is a measure of the correlation between the received vector and the 
mth signal. For this reason, we call C(r,s m ), m = 1, 2, .... M, the correlation 
metrics for deciding which of the M signals was transmitted. Finally, the terms 

IsJ 2 = m- 1,2 M, may be viewed as bias terms that serve as 

compensation for signal sets that have unequal energies, such as PAM. If all 
signals have the same energy, |s„,| 2 may also be ignored in the computation of 
the correlation metrics C(r, s,„) and the distance metrics D(r, s,„) or D'(r,s,„). 

It is easy to show (see Problem 5-5) that the correlation metrics C(r. s,„) can 
also be expressed as 

C(r, s,„) = 2 [ r(Os,„(/) dt - if,,,, m = 0, 1 M (5-1-45) 

A) 

Therefore, these metrics can be generated by a demodulator that cross- 
correlates the received signal r(t) with each of the M possible transmitted 
signals and adjusts each correlator output for the bias in the case of unequal 
signal energies. Equivalently, the received signal may be passed through a 
bank of M filters matched to the possible transmitted signals {.*„,(/)} and 
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FIGURE 5-1-9 An alternative realization of the optimum AWGfs receiver. 


sampled at t = T, the end of the symbol interval. Consequently, the optimum 
receiver (demodulator and detector) can be implemented in the alternative 
configuration illustrated in Fig. 5-1-9. 

In summary, we have demonstrated that the optimum ML detector 
computes a set of M distances D(r, s,„) or D'(r, s,„) and selects the signal 
corresponding to the smallest (distance) metric. Equivalently, the optimum ML 
detector computes a set of M correlation metrics C(r, s,„) and selects the signal 
corresponding to the largest correlation metric. 

The above development for the optimum detector treated the important case 
in which all signals are equally probable. In this case, the MAP criterion is 
equivalent to the ML criterion. However, when the signals are not equallv 
probable, the optimum MAP detector bases its decision on the probabilities 
P(K, | r), m = 1, 2, . . . , M, given by (5-1-38) or, equivalently, on the merries, 

PM(r, s,„) = p(r | s„,)P(s,„) 

The following example illustrates this computation for binary PAM signals. 


Example 5-1-3 

Consider the case of binary PAM signals in which the two possible signal 
points are s, - -s 2 = V^, where %, is the energy per bit. The prior 
probabilities are P(s l )=p and P(s 2 ) = 1 -p. Let us determine the metrics 
for the optimum MAP detector when the transmitted signal is corrupted 
with AWGN. 

The received signal vector (one-dimensional) for binary PAM is 

r = ±V% + y„(T) 


(5-1-46) 
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FIGURE 5-MO 


where y„(T) is a zero-mean Gaussian random variable with variance 
cr 2 = 2 ^ 0 ' Consequently, the conditional pdfs p(rj5 m ) for the two signals 
are 


,,<r|i ' )= ^k exp - 

^ z)= vd b -~ exp 


(r~VW b ) 2 - 

2 oi 

(r + V^) 2 1 

2c r 2 „ J 


(5-1-47) 

(5-1-48) 


Then the metrics PM( r, S] ) and PM{ r, S 2 ) are 
PA/(r, s,) =pp(r j -S', ) 


P 

PM( r,s 2 ) = 



(5-1-49) 

(5-1-50) 


If PM( r, s,) > PA/(r,s 2 ), we select Si as the transmitted signal; otherwise, we 
select s 2 - This decision rule may be expressed as 


But 


PAf(t, s,) ^ 
PM(t , s 2 ) ^ 


(5-1-51) 


PM(r, s,) _ p j > + V%f 

PM(r, s 2 ) 1 - p CXP L 2a 2 


(5-1-52) 


so that (5-1-51) may be expressed as 


or equivalently, 


(r+ V^) 2 -(r- . 1-p 

— ; § In 

2 <t„ 5, p 

r ^ k<r 2 n In - — ~ ~ jA/ 0 ln~^ 
*2 p p 


(5-1-53) 


(5-1-54) 


This is the final form for the optimum detector. It computes the 
correlation metric C(r, s,) = rV^ and compares it with threshold 
ijV 0 ln((l -p)/p], Figure 5-1-10 illustrates the two signal points s, and s 2 . 
The threshold, denoted by t h , divides the real line into two regions, say /?, 
and R 2 , where P, consists of the set of points that are greater than x h and 


'2 = -fa 


Signal space representaiion illustrating 
the operation of (he optimum detector 
for binary (PAM) modulation. 


Region R : 


f 


V L ~-0h 


Region R ( 
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R 2 consists of the set of points that are less than r h . If rVW h > T h , the 
decision is made that was transmitted, and if rVW b < x h , the decision is 
made that s 2 was transmitted. The threshold T h depends on N 0 and p. If 
p = i r h = 0. If p > i the signal point s, is more probable and, hence, 
r h <0. In this case, the region /?, is larger than R 2 , so that s , is more likely 
to be selected than s 2 . If p <. 2 , the opposite is the case. Thus, the average 
probability of error is minimized. 


It is interesting to note that in the case of unequal prior probabilities, it is 
necessary to know not only the values of the prior probabilities but also the 
value of the power spectral density N 0 in order to compute the threshold. 
When p = 2 , the threshold is zero, and knowledge of N 0 is not required by the 
detector. 

We conclude this section with the proof that the decision rule based on the 
maximum-likelihood criterion minimizes the probability of error when the M 
signals are equally probable a priori. Let us denote by R m the region in the 
W-dimensional space for which we decide that signal s m (t) was transmitted 
when the vector r= [r, r 2 • ■ • r*] is received. The probability of a decision 
error given that s m (t) was transmitted is 


P(e\* m )=f p(r|s m )</r 

JR' 


(5-1-55) 


where R' m is the complement of R m . The average probability of error is 
P ( e ) = E 71^ I Sm) 

m=* 1 M. 


M \ C 


(5-1-56) 


Note that P{e ) is minimized by selecting the signal s m if p{ t \ s m ) is larger than 
p(r | s*) for all m^k. 

When the M signals are not equally probable, the above proof can be 
generalized to show that the MAP criterion minimizes the average probability 
of error. 


5-1-4 The Maximum-Likelihood Sequence Detector 

When the signal has no memory, the symbol-by-symbol detector described in 
the preceding section is optimum in the sense of minimizing the probability of 
a symbol error. On the other hand, when the transmitted signal has memory, 
i.e., the signals transmitted in successive symbol intervals are interdependent, 
the optimum detector is a detector that bases its decisions on observation of a 
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sequence of received signals over successive signal intervals. Below, we 
describe two different types of detection algorithms. In this section, we 
describe a maximum-likelihood sequence detection algorithm that searches for 
the minimum euclidean distance path through the trellis that characterizes the 
memory in the transmitted signal. In the following section, we describe a 
maximum a posteriori probability algorithm that makes decisions on a 
symbol -by-symbol basis, but each symbol decision is based on an observation 
of a sequence of received signal vectors. 

To develop the maximum likelihood sequence detection algorithm, let us 
consider, as an example, the NRZI signal described in Section 4-3-2. Its 
memory is characterized by the trellis shown in Fig. 4-3-14. The signal 
transmitted in each signal interval is binary PAM. Hence, there are two 
possible transmitted signals corresponding to the signal points s, = -s 2 = 
where % is the energy per bit. The output of the matched-filter or correlation 
demodulator for binary PAM in the klh signal interval may be expressed as 

r k ~±VW h + n k (5-1-57) 


where n k is a zero-mean gaussian random variable with variance of, = N 0 /2. 
Consequently, the conditional pdfs for the two possible transmitted signals are 


pi'k K) = 


P(r k |s 2 ) = 


1 

v5 It <t, 

l 


-exp 


V2 Ka n 
the 


exp 


[- 


(A ~ \%) l 2 
2 of 

(r k + \%) 2 ^ 


(5-1-58) 


2 of 

Now, suppose we observe the sequence of matched-filter outputs 
r i, h- • • Since the channel r.oise is assumed to be white and gaussian, and 
fit ~ iT), fit- jT) for i * j are orthogonal, it follows that £{n*n y ) = 0, k # j. 
Hence, the noise sequence n u n 2 , is also white. Consequently, for any 

given transmitted sequence s (m) , the joint pdf of r u r 2 r K may be expressed 

as a product of K marginal pdfs, i.e.. 


Pir t ,r 2 r*|s (m) = fl p(r A j 

k " \ 

-ft ‘ xp [-fe^ g ) ! j 

A ) VlK o„ L 2of J 

l - .)* exp r y ir k -sm 

^V2?r ctJ P L 2 of J 


(5-1-59) 


where either s k —VW b or s k — - VW b . Then, given the received sequence 
r i> r 2 > ■ ■ ■ , r K at the output of the matched filter or correlation demodulator, the 
detector determines the sequence s <m) = {s ( 1 m) , sf\.. . ,s£ ,) } that maximizes 
the conditional pdf p(r u r 2 , . . . , r K | s (m) ). Such a detector is called the 
maximum-likelihood (ML) sequence detector. 

By taking the logarithm of (5-1-59) and neglecting the terms that are 
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[-11 Trellis for NRZ1 signal. 



I = T I = 2T l = *T t - 4E 


independent of (/-,, r 2 , . . . , r K ), we find that an equivalent ML sequence 
detector selects the sequence s 1 "'* that minimizes the euclidean distance metric 

D( r, *<"'’) = t ( r k - 4 m ») 2 (5-1-60) 

k - 1 


In searching through the trellis for the sequence that minimizes the 
euclidean distance D(r, s ( ' M) ), it may appear that we must compute the distance 
D(r, s 1 "”) for every possible sequence, For the NRZI example, which employs 
binary modulation, the total number of sequences is 2 K , where K is the 
number of outputs obtained from the demodulator. However, this is not the 
case. We may reduce the number of sequences in the trellis search by using the 
Viterbi algorithm to eliminate sequences as new' data is received from the 
demodulator. 

The Viterbi algorithm is a sequential trellis search algorithm for performing 
ML sequence detection. It is described in Chapter 8 as a decoding algorithm 
for convolutional codes. We describe it below in the context of the NRZI 
signal. We assume that the search process begins initially at state 5,,. The 
corresponding trellis is shown in Fig. 5-1-11. 

At time t = T, we receive r t ~ V" 0 + n from the demodulator, and at t = IT, 
we receive r 2 =s ( 2 m) + n 2 . Since the signal memory is one bit, which we denote 
by L= 1, we observe that the trellis reaches its regular (steady state) form 
after two transitions. Thus, upon receipt of r 2 at t = 2T (and thereafter), we 
observe that there are two signal paths entering each of the nodes and two 
signal paths leaving each node. The two paths entering node S,, at t = 2T 
correspond to the information bits (0.0) and (1,1) or, equivalently, to the 
signal points (-V^, -VW h ) and (Ve^, -V^), respectively. The two paths 
entering node S, at t = 2T correspond to the information bits (0, 1) and (1.0) 
or, equivalently, to the signal points (-V^, VV^) and (V^, 
respectively. 

For the two paths entering node S„, we compute the two Euclidean distance 
metrics 


A)(0, 0) = (r, + + (r 2 + V^) 2 

A,(l.l) = (r, - V^) 2 + (r 2 + V^) 2 


(5-1-61) 
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by using the outputs r, and r 2 from the demodulator. The Viterbi algorithm 
compares these two metrics and discards the path having the larger (greater- 
distance) metric.t The other path with the lower metric is saved and is called 
the survivor at t-2T. The elimination of one of the two paths may be done 
without compromising the optimality of the trellis search, because any 
extension of the path with the larger distance beyond / = 2 T will always have a 
larger metric than the survivor that is extended along the same path beyond 
t = 27. 

Similarly, for the two paths entering node 5, at t = 2T, we compute the two 
Euclidean distance metrics 


D,(0,l) = (r 1 + \ / ^) 2 +(r 2 -V ^) 2 
W,0 ) = (r i -\%) 2 +(r 2 ~V¥ b ) 2 


by using the outputs r, and r 2 from the demodulator. The two metrics are 
compared and the signal path with the larger metric is eliminated. Thus, at 
t - 2 T, we are left with two survivor paths, one at node S 0 and the other at 
node Si, and their corresponding metrics. The signal paths at nodes 5^, and S, 
are then extended along the two survivor paths. 

Upon receipt of r 3 at / - 3 T, we compute the metrics of the two paths 
entering state Sq. Suppose the survivors at t = 2T are the paths (0, 0) at and 
( 0 , 1 ) at Si. Then, the two metrics for the paths entering Sq at t = 37 are 

D o (0, 0, 0) = £> o (0, 0) + (r 3 + V%,) 2 

D o (0, 1,1) = A( 0, 1) + (r 3 + (5 ' 1_63) 


These two metrics are compared and the path with the larger (greater- 
distance) metric is eliminated. Similarly, the metrics for the two paths entering 
Si at t = 37 are 


D,(0,0, l) = D o (0,0) + (r 3 -V ^) 2 
Z) 1 (0,1,0) = Z1 1 (0, l) + (r 3 - \%) 2 


These two metrics are compared and the path with the larger (greater- 
distance) metric is eliminated. 

This process is continued as each new signal sample is received from the 
demodulator. Thus, the Viterbi algorithm computes two metrics for the two 
signal paths entering a node at each stage of the trellis search and eliminates 
one of the two paths at each node. The two survivor paths are then extended 
forward to the next state. Therefore, the number of paths searched in the 
trellis is reduced by a factor of two at each stage. 

It is relatively easy to generalize the trellis search performed by the Viterbi 
algorithm for U - ary modulation. For example, delay modulation employs 


t Note that, for NRZI, the reception of r 2 from the demodulator neither increases nor decreases 
the relative difference between the two metrics, D o (0,Q) and D 0 (l, l). At this point, one may 
ponder on the implication of this observation. In any case, we continue with the description of the 
ML sequence detector based on the Viterbi algorithm. 
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FIGURE 5-1- 12 


One stage of trellis diagram for delay 
modulation. 



M = 4 signals and is characterized by the four-state trellis shown in Fig. 5-1-12. 
We observe that each state has two signal paths entering and two signal paths 
leaving each node. The memory of the signal is L = 1. Hence, the Viterbi 
algorithm will have four survivors at each stage and their corresponding 
metrics. Two metrics corresponding to the two entering paths are computed at 
each node, and one of the two signal paths entering the node is eliminated at 
each state of the trellis. Thus, the Viterbi algorithm minimizes the number of 
trellis paths searched in performing ML sequence detection. 

From the description of the Viterbi algorithm given above, it is unclear as to 
how decisions are made on the individual detected information symbols given 
the surviving sequences. If we have advanced to some stage, say K, where 
K » L in the trellis, and we compare the surviving sequences, we shall find that 
with probability approaching one all surviving sequences will be identical in bit 
(or symbol) positions K — 5L and less. In a practical implementation of the 
Viterbi algorithm, decisions on each information bit (or symbol) are forced 
after a delay of 5 L bits (or symbols), and hence, the surviving sequences are 
truncated to the 5 L most recent bits (or symbols). Thus, a variable delay in bit 
or symbol detection is avoided. The loss in performance resulting from the 
suboptimum detection procedure is negligible if the delay is at least 5 L. 

Example 5-1-4 

Consider the decision rule for detecting the data sequence in an NRZI 
signal with a Viterbi algorithm having a delay of 5L bits. The trellis for the 
NRZI signal is shown in Fig. 5-1-11. In this case, 1, hence the delay in 
bit detection is set to five bits. Hence, at t = 6 T, we shall have two surviving 
sequences, one for each of the two states and the corresponding metrics 
. b 2 , b 3 , b 4 , b 5 , b 6 ) and fi b (b[, b 2 , b' it b^, b$, b£). At this stage, with 
probability nearly equal to one, the bit b, will be the same as b[\ that is. 
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both surviving sequences will have a common first branch. If b^bl, we 
may select the bit (61 or b[) corresponding to the smaller of the two metrics. 
Then the first bit is dropped from the two surviving sequences. At t ~ IT, 
the two metrics fx.-,(b 2 , b 2 , b A , b 5 , b 6 , b 7 ) and fx^bj, bj, b' A , b’ s , bl, b 7 ) will be 
used to determine the decision on bit b 2 ■ This process continues at each 
stage of the search through the trellis for the minimum distance sequence. 
Thus the detection delay is fixed at five bits.f 


5-1-5 A SymboI-by-Symbol Detector for Signals 
with Memory 

In contrast to the maximum-likelihood sequence detector for detecting the 
transmitted information, we now describe a detector that makes symbol-by- 
symbol decisions based on the computation of the maximum a posteriori 
probability (MAP) for each detected symbol. Hence, this detector is optimum 
in the sense that it minimizes the probability of a symbol error. The detection 
algorithm that is presented below is due to Abend and Fritchman (1970), who 
developed it as a detection algorithm for channels with intersymbol inter- 
ference, i.e., channels with memory. 

We illustrate the algorithm in the context of detecting a PAM signal with M 
possible levels. Suppose that it is desired to detect the information symbol 

transmitted in the Arth signal interval, and let r u r 2 r k + D be the bbserved 

received sequence, where D is the delay parameter which is chosen to exceed 
the signal memory, i.e., D 3= L, where L is the inherent memory in the signal. 
On the basis of the received sequence, we compute the posterior probabilities 

~ A„, | r k+D , M (5-1-65) 

for the M possible symbol values and choose the symbol with the largest 
probability. Since 


/>(*<*> = A m | r k+D , , r.) = ^ k+D ’ - • > ri I jW ( 5 - 1-66 

Pvk + Df r k + D- li * • • > ^l) 


and since the denominator is common for all Af probabilities, the maximum a 
posteriori probability (MAP) criterion is equivalent to choosing the value of 
s lk) that maximizes the numerator of (5-1-66). Thus, the criterion for deciding 
on the transmitted symbol s (k) is 


5 <*> = 


= ar g {max/>(r* +0 , . . . , r, | s ( *> = A m )P{s (k> = A m )} (5-1- 


67) 


f One may have observed by now tha« the ML sequence detector and the symbol-by-symbo! 
detector that ignores the memory in the NRZI signal reach the same decisions. Hence, there is no 
need for a decision delay. Nevertheless, the procedure described above applies in general. 
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When the symbols are equally probable, the probability P(s (k) = A m ) may be 
dropped from the computation. 

The algorithm for computing the probabilities in (5-1-67) recursively begins 
with the first symbol s ( ” We have 


1) = arg fma xp(r k + D ,.. . , r, | s'” = A m )P(s ,l) = A m ) 

.S' 1 ' 


= arg 


max 

S<" 


■ ■ 2 P( r \ + D> 

s <" 


r, s 


(1+0) 


s 0) )P{s' 


,0 + 0 ) 



= arg 


max 2/ 

s'" 




s 


( 2 ) 



(5-1-68) 


where s 0) denotes the decision on s (l> and, for mathematical convenience, we 
have defined 


P i (s' 


( 1 + 0 ) 


S a \ s (n ) = p(r l+D , 


M s i 


s il) )P(s 


( 1 + 0 ) 


.s'”) 

(5-1-69) 


The joint probability /’(s (l + 0) , . . ,s a \ s'”) may be omitted if the symbols are 
equally probable and statistically independent. As a consequence of the 
statistical independence of the additive noise sequence, we have 

P(fi + o r, . . . .s'”) 

= p(r UD | s' 1 .... s ^ D l ) )p(r n | s' D \ . . . , s'"'”) • • • 

P(r 2 |j (2> ,s n, )p(r 1 | s'”) (5-1-70) 


where we assume that s'*' = 0 for k =£ 0. 

For detection of the symbol s' 2 \ we have 


s‘ (2) = arg fmaxp(r 2+Dt . . . , r, | s' 2 ' = A„,)P(s (2 > = A,„)J 

= arg (max £ ■•-2p(r 2 + 0 r, | s {2} )P(s a+P) 


• . s t2, )| 
(5-1-71) 


The joint conditional probability in the multiple summation can be expressed 
as 


Pfo+o r, |s‘ 2tn \ . . . ,s< 2 >) 

= P(r 2 ,/>[s f2 ‘ 0> s (2f/> - / -')p(r l+/ , ...r, Is 0 * 2 ” s (2) ) (5-1-72) 
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Furthermore, the joint probability 

P(r l+D r,| 5 0 + 0, I ...,s <2 W l + 0 > 5 (2) ) 

can be obtained from the probabilities computed previously in the detection of 
s (1) . That is, 

p{ r i+D, ■ ■ ■ K‘ + D) s m ) 


= 2p(r l+D ,...,r 1 |^ 1 + D \...,5“W !+D) s (U ) 


(5-1-73) 


Thus, by combining (5-1-73) and (5-1-72) and then substituting into (5-1-71), 
we obtain 


r<2> = 


arg max £ ‘ ' ' X Pi^ < ' l+D) l ■ ■ ■ , -s (3) , s <2> )} (5-1-74) 

S l2 > ,,( 2-/51 ) 


where, by definition, 

p : (s (2 + °\..., 5 l3 \5 (2 >) 

= p(r2 + o\s (2 + D \...,s (2 + p, - > )P(s^ D ')Zp,(s (i + D) s ( 2 , ,i ,,, ) 

(5-1-75) 

In general, the recursive algorithm for detecting the symbol s (k> is as follows: 
upon reception of r k+D , . . . , r 2 , r, , we compute 


s {k) = arg \ ma xp(r k + D r, | s ( *>)P( J < *))j 

: 2 2 Pk(s< k + D) s ( * 4| >,j<«)} (5-1-76) 


arg max 

1 ,(*i 


where, by definition, 

p k (s (k+D \... ,s< k ~",s (k) ) 

~ Pi r k + D | S ik * D \ .... S ( * 4D L W* 4D) ) S Pk-l(s"-' + D ' 5 lk - '>) 

,vl‘ " 

(5-1-77) 

Thus, the recursive nature of the algorithm 'is established by the relations 
(5-1-76) and (5-1-77). 
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The major problem with the algorithm is its computational complexity. In- 
particular, the averaging performed over the symbols s ik * D) , . . . , s {k + i) , s (k) in 
(5-1-76) involves a large amount of computation per received signal, especially 
if the number M of amplitude levels {A m } is large. On the other hand, if M is 
small and the memory L is relatively short, this algorithm is easily 
implemented. 


5-2 PERFORMANCE OF THE OPTIMUM RECEIVER 
FOR MEMORYLESS MODULATION 

In this section, we evaluate the probability of error for the memoryless 
modulation signals described in Section 4-3-1. First, we consider binary PAM 
signals and then M - ary signals of various types. 


5-2-1 Probability of Error for Binary Modulation 

Let us consider binary PAM signals where the two signal waveforms are 
s i(0 = g(0 and s 2 (t)~ and g(r) is an arbitrary pulse that is nonzero in 

the interval 0S(«r 6 and zero elsewhere. 

Since s,(/) = —s 2 (t), these signals are said to be antipodal. The energy in the 
pulse git) is £ h . As indicated in Section 4-3-1, PAM signals are one- 
dimensional, and, hence, their geometric representation is simply the one- 
dimensional vector s, = j 2 = ~V¥ h . Figure 5-2-1 illustrates the two signal 
points. 

Let us assume that the two signals are equally likely and that signal s,(t) was 
transmitted. Then, the received signal from the (matched filter or correlation) 
demodulator is 

r = s x + n =VW h + n (5-2-1) 

where n represents the additive gaussian noise component, which has zero 
mean and variance a;, = (N„. In this case, the decision rule based on the 
correlation metric given by (5-1-44) compares r with the threshold zero. If 
r > 0, the decision is made in favor of .v,(/), and if r < 0, the decision is made 
that s 2 (t) was transmitted. Clearly, the two conditional pdfs of r are 

P(r j Si) = e " ' (5-2-2) 




FIGURE 5-2-1 Signal points for binary antipodal signals. 


- 

0 
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FIGURE 5-2-2 



p<rA,) , 

/f\ 

, PtrA,) 

Conditional pdfs of two signals. 




These two conditional pdfs are shown in Fig. 5-2-2. 

Given that St(r) was transmitted, the probability of error is simply the 
probability that r <0, i.e., 


I *0 ~ J° P( r \si)dr 

r 

VrtV 0 L-. 


f 

r (r-V^) 2 l 

L exp 

1 

1 

0 * 

l 


dr 


j /--V 2Jf„//V 0 


I 


~wS 

= Q 


' V2 dx 


x 2 a 


dx 


v5 7^s 0 



(5-2-4) 


where £)(jt) is the Q-function defined in (2-1-97). Similarly, if we assume that 
s 2 (t) was tran smitted, r = -V% + n and the probability that r>0 is also 
P(e | * 2 ) “ Q(V2%M). Since the signals $,(/) and s 2 (t) are equally likely to be 
transmitted, the average probability of error is 

P b = 1 2 P(e\s i ) + \P(e\s 2 ) 



We should observe two important characteristics of this performance 
measure. First, we note that the probability of error depends only on the ratio 
% b IN 0 and not on any other detailed characteristics of the signals and the noise. 
Secondly, we note that 2 % b fN Q is also the output SNl^from the matched-filter 
(and correlation) demodulator. The ratio % b /N 0 is usually called the signal -co- 
noise ratio per bit. 

We also observe that the probability of error may be expressed in terms of 
the distance between the two signals s, and s 2 . From Fig. 5-2-1, we observe 
that the two signals are separated by the distance d n = 2V%. By substituting 
%b - \d\ 2 into (5-2-5), we obtain 



CHAPTER 5: OPTIMUM RECEIVERS FOR THE ADDITIVE WHITE GAUSSIAN NOISE CHANNEL 259 


FIGURE 5-2-3 


Signal points for binary orthogonal signals 



This expression illustrates the dependence of the error probability on the 
distance between the two signal points. 

Next, let us evaluate the error probability for binary orthogonal signals. 
Recall that the signal vectors Si and Sj are two-dimensional, as shown in Fig. 
5-2-3, and may be expressed, according to (4-3-30), as 


Si = [V£ fc 0) 

*2=10 V*J 


(5-2-7 1 


where denotes the energy for each of the waveforms. Note that the distance 
between these signal points is d ]2 = V2£^. 

To evaluate the probability of error, let us assume that s, was transmitted 
Then, the received vector at the output of the demodulator is 

r = [V^ + n, n 2 ] (5-2-8) 

We can now substitute for r into the correlation metrics given by (5-1-44) to 
obtain C(r, s t ) and C(r. s 2 ). Then, the probability of error is the probability 
that C(r, s 2 )> C(r, Si). Thus. 


P(e | s,) = P[C(r,S 2 )> C(r,,S|)] = P[n 2 -n , > VW h ) (5-2-9) 


Since n, and n 2 are zero-mean statistically independent gaussian random 
variables each with variance 2 l ,V„, the random variable .r = n 7 - n , is zero-mean 
gaussian with variance N 0 . Hence, 



(5-2-10) 


Due to symmetry, the same error probability is obtained when we assume that 
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S 2 is transmitted. Consequently, the average error probability for binary 
orthogonal signals is 

P h = Q(J*J = Q&n) (5-2-11) 

where, by definition, y h is the SNR per bit. 

If we compare the probability of error for binary antipodal signals with that 
for binary orthogonal signals, we find that orthogonal signals require a factor 
of two increase in energy to achieve the same error probability as antipodal 
signals. Since 101og l() 2 = 3 dB, we say that orthogonal signals are 3 dB poorer 
than antipodal signals. The difference of 3 dB is simply due to the distance 
between the two signal points, which is d 2 ]2 = 2% h for orthogonal signals, 
whereas d] 2 ~ 4 for antipodal signals. 

The error probability versus 101og l() %IN 0 for these two types of signals is 
shown in Fig. 5-2-4. As observed from this figure, at any given error 
probability, the required for orthogonal signals is 3dB more than that 

for antipodal signals. 


5-2-2 Probability of Error for M- ary Orthogonal Signals 

For equal energy orthogonal signals, the optimum detector selects the signal 
resulting in the largest cross correlation between the received vector r and each 
of the M possible transmitted signal vectors {s,„}, i.e., 

C(r, s,„) = r *s„, = 2 r k s mk , m = 1,2, .... M (5-2-12) 

t = 1 


FIGURE 5-2-4 Probability of error for binary signals. 
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To evaluate the probability of error, let us suppose that the signal s, is 
transmitted. Then the received signal vector is 

r=[V% s + n t n 2 n 3 ... n M ] (5-2-13) 


where n u n 2 , . . . , n M are zero-mean, mutually statistically independent gaus- 
sian random variables with equal variance a 2 n = \ N 0 . In this case, the outputs 
from the bank of M correlators are 


C(r, = V% S (V% S + «,) 

C(r,S2> = VW s n 2 


(5-2-14) 


C(r. s M ) ~ V% s n M 


Note that the scale factor % may be elminated from the correlator outputs by 
dividing each output by \f% s . Then,- with this normalization, the pdf of the first 
correlator output (r, = VW S - «,) is 


PrX* i) 


V^ exp - 


Cfl - 

Ao - 


and the pdfs of the other M — 1 correlator outputs are 


(5-2-15) 


Pr m (X„) = 


VkNo 




m = 2, 3, . . , , M 


(5-2-16) 


It is mathematically convenient to first derive the probability that the 
detector makes a correct decision. This is the probability that r x is larger than 
each of the other M - 1 correlator outputs n 2 , n 3 , . . . n M . This probability may 
be expressed as 


P c =\ P(n 2 <r i ,n 3 <r i , n M < r, | r,)p(rj) dr x (5-2-17) 

J -x 


where P(n 2 <r,, n 3 <r u . . n M < | rj denotes the joint probability that 

n 2 , n 3 , . . . ,n„ are all less than r u conditioned on any given r,. Then this joint 
probability is averaged over all r,. Since the {r m } are statistically independent, 
the joint probability factors into a product of M - 1 marginal probabilities of 
the form 


P{n m < r, 


^l) j P r m (-^m ) dx m , 2, 3, ... , M 

= -! -f 

yfaL 


r r,V2A\] 


e~ xil2 dx 


(5-2-18) 


These probabilities are identical for m ~ 2, 3, . . . , M, and, hence, the joint 
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probability under consideration is simply the result in (5-2-18) raised to the 
(M - 1 )th power. Thus, the probability of a correct decision is 


P ; = 


r ( i ('' X2!N - 

J-Jv2 f 


i '/2 



p( r \)d r \ 


(5-2-19) 


and the probability of a (/c-bit) symbol error is 

Pv = \- P r (5-2-20) 

where 



i ■ 

exp 



(5-2-21) 


The same expression for the probability of error is obtained when any one 
of the other M - 1 signals is transmitted. Since all the M signals are equally 
likely, the expression for P M given in (5-2-21) is the average probability of a 
symbol error. This expression can be evaluated numerically. 

In comparing the performance of various digital modulation methods, it is 
desirable to have the probability of error expressed in terms of the SNR per 
bit, i h /N t „ instead of the SNR per symbol, %/A/ () . With M = 2*, each symbol 
conveys k bits of information, and hence % = k%. Thus, (5-2-21) may be 
expressed in terms of % h /N„ by substituting for %. 

Sometimes, it is also desirable to convert the probability of a symbol error 
into an equivalent probability of a binary digit error. For equiprobable 
orthogonal signals, all symbol errors are equiprobable and occur with 
probability 


Pm 

M- 1 



(5-2-22) 


Furthermore, there are (*) ways in which n bits out of k may be in error. 
Hence, the average number of bit errors per It-bit symbol is 



(5-2-23) 


and the average bit error probability is just the result in (5-2-23) divided by k, 
the number of bits per symbol. Thus, 

2* - 1 p 

p b = P M ** y , k»\ (5-2-24) 


The graphs of the probability of a binary digit error as a function of the 
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FIGURE 5-2-5 


Probability of bit error for coherent detection of 
orthogonal signals. 



SNR per bil. y,, < tlB > 


SNR per bit, % h /N t) , are shown in Fig. 5-2-5 for M - 2,4, 8, 16, 32 and 64. This 
figure illustrates that, by increasing the number M of waveforms, one can 
reduce the SNR per bit required to achieve a given probability of a bit error. 
For example, to achieve a P h = 10" \ the required SNR per bit is a little more 
than 12 dB for M = 2, but if M is increased to 64 signal waveforms 
(k - 6 bits/symbol), the required SNR per bit is approximately 6 dB. Thus, a 
savings of over 6dB (a factor-of-four reduction) is realized in transmitter 
power (or energy) required to achieve a P h = 10 _s by increasing M from M = 2 
to M — 64. 

What is the minimum required to achieve an arbitrarily small 

probability of error as M — * *? This question is answered below. 

A Union Bound on the Probability of Error Let us investigate the effect 
of increasing M on the probability of error for orthogonal signals. To simplify 
the mathematical development, we first derive an upper bound on the 
probability of a symbol error that is much simpler than the exact form given in 
(5-2-21). 

Recall that the probability of error for binary orthogonal signals is given by 
(5-2-11), Now, if we view the detector for M orthogonal signals as one that 
makes M - 1 binary decisions between the correlator output C(r, s,) that 
contains the signal and the other M-l correlator outputs C(r, s,„), m = 

2,3 M, the probability of error is upper-bounded by the union bound of 

the M - 1 events. That is, if E, represents the event that C(r, s ; ) > C(r, s,) for 
/ # 1 then we have P M = P((J' , = , £,) « 2"-, P(E,). Hence. 

P"*s(M-\)P 2 = (M-1 )Q{V%Tn «) < MQ{V%JN n ) (5-2-25) 
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This bound can be simplified further by upper-bounding Q{ytJN n ). We have 


Thus, 


Q{VfJN ( ,)<e * J2N “ 


(5-2-26) 


P„<Me l ‘ j2 ' x = 2 k e ki - nN " 
p M <e 2lni >'2 


(5-2-27) 


As k — * or equivalently, as M — * the probability of error approaches zero 

exponentially, provided that f h /N {) is greater than 2 In 2, i.e., 

f h 

~rf> 2 In 2 =1.39 (1.42 dB) (5-2-28) 

Ao 


The simple upper bound on the probability of error given by (5-2-27) 
implies that, as long as SNR > 1.42 dB, we can achieve an arbitrarily low P M . 
However, this union bound is not a very tight upper bound at a sufficiently low 
SNR due to the fact that the upper bound for the Q function in (5-2-26) is 
loose. In fact, by more elaborate bounding techniques, it is shown in Chapter 7 
that the upper bound in (5-2-27) is sufficiently tight for % h /N {} > 4 In 2. For 
%/N {) < 4 In 2, a tighter upper bound on P M is 

P M < 2e kV " * 6>x ' (5-2-29) 

Consequently, />»,—» 0 as k -» provided that 

% 

rf> In 2 = 0.693 (-1.6dB) (5*2-30) 

Hence, -1.6dB is the minimum required SNR per bit to achieve an arbitrarily 
sm^ll probability of error in the limit as * (A/-> x). This minimum SNR 
per bit (-1.6dB) is called the Shannon limit for an additive white Gaussian 
noise channel. 


5-2-3 Probability of Error for M -ary Biorthogonal Signals 

As indicated in Section 4-3, a set of M = 2* biorthogonal signals are 
constructed from ' 2 M orthogonal signals by including the negatives of the 
orthogonal signals. Thus, we achieve a reduction in the complexity of the 
demodulator for the biorthogonal signals relative to that for orthogonal signals, 
since the former is implemented with cross-correlators or matched filters, 
whereas the latter requires M matched filters or cross-correlators. 

To evaluate the probability of error for the optimum detector, let us assume 
that the signal *,(/) corresponding to the vector s, = [V^ 0 0 ... 0] was 
transmitted. Then, the received signal vector is 

r = [VW s + n { n 2 ... n Mn ) (5-2-31) 

where the {«,„} are zero-mean, mutually statistically independent and identi- 
cally distributed gaussian random variables with variance crl - ^Y () . The- 



magnitude of the cross-correlators 

Mil 

C(r, s,„) = r ■ s,„ = X r kS„,k- m~l,2, (5-2-32) 

k= I 

while the sign of this largest term is used to decide w'hether or -a„,( 0 w'as 
transmitted. According to this decision rule, the probability of a correct 
decision is equal to the probability that r x = \'^ + « l >0 and r, exceeds 
\r„, | = | n„,\ for m - 2, 3, . . . , \M. But 
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is similar to that for orthogonal signals (see Fig. 5-2-5). However, in this case, 
the probability of error for M = 4 is greater than that for M - 2. This is due to 
the fact that we have plotted the symbol error probability P M in Fig. 5-2-6. If 
we plotted the equivalent bit error probability, we should find that the graphs 
for M ~ 2 and M - 4 coincide. As in the case of orthogonal signals, as M — * * 
(or /£—»<»), the minimum required <£ h //V 0 to achieve arbitrarily small prob- 
ability of error is -1.6dB, the Shannon limit. 


5-2-4 Probability of Error for Simplex Signals 

Next we consider the probability of error for M simplex signals, Recall from 
Section 4-3 that simplex signals are a set of M equally correlated signals with 
mutual cross-correlation coefficient p,„„ = -l/(Af - 1). These signals have the 
same minimum separation of between adjacent signal points in M- 

dimensional space as orthogonal signals. They achieve this mutual separation 
with a transmitted energy of %(M — 1 )/M, which is less than that required for 
orthogonal signals by a factor of (M - 1 )/M. Consequently, the probability of 
error for simplex signals is identical to the probability of error for orthogonal 
signals, but this performance is achieved with a saving of 

M 

10 log (1 - p) = 10 log— — - dB (5-2-35) 

M - 1 

in SNR. For M = 2. the saving is 3 db. However, as M is increased, the saving 
in SNR approaches OdB. 


5-2-5 Probability of Error for M- ary Binary-Coded Signals 

We have shown in Section 4-3 that binary-coded signal waveforms are 
represented by the signal vectors 


S Hi I $m2 • • ■ ■S/m.V ] » m 1,2,..., M 


where s mj - ±V%/N for all m and j. N is the block length of the code, and is 
also the dimension of the M signal waveforms. 

If d l mL is the minimum euclidean distance of the M signal waveforms then 
the probability of a symbol error is upper-bounded as 


<(M-l)P h = (M-\)Q 



< 2* exp 


[ 


(dffn) 2 ' 

4Ni, - 


(5-2-36) 
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The value of the minimum euclidean distance will depend on the selection of 
the code words, i.e., the design of the code. 


5-2-6 Probability of Error for A/-ary PAM 

Recall that M - ary PAM signals are represented geometrically as M one- 
dimensional signal points with value 

s,„ = V^A„,, m = 1, 2 M (5-2-37) 

where is the energy of the basic signal pulse g(t). The amplitude values may 
be expressed as 


A m = (2m - 1 - M)d, m = 1, 2 M (5-3-38) 

where the euclidean distance between adjacent signal points is dVlW^. 


1 M 


M , . 

d% 
2 M 


d 2 v M 

■ 2 (2m - 1 - Mf 

ZM ^ | 




(5-2-39) 


Equivalently, we may characterize these signals in terms of their average 
power, which is 

P.v = y=£(M 2 - 1)^ (5-2-40) 

The average probability of error for Af-ary PAM can be determined from 
the decision rule that maximizes the correlation metrics given by (5-1-44). 
Equivalently, the detector compares the demodulator output r with a set of 
Af — 1 thresholds, which are placed at the midpoints of successive amplitude 
levels, as shown in Fig. 5-2-7. Thus, a decision is made in favor of the 
amplitude level that is closest to r. 

The placing of the thresholds as shown in Fig. 5-2-7 helps in evaluating the 
probability of error. We note that if the mth amplitude level is transmitted, the 
demodulator output is 


r = + n = VT ¥ k A„, + n 


(5-2-41) 


FIGURE 5-2-7 Placement of thresholds at midpoints of 
successive amplitude levels. 
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where the noise variable n has zero mean and variance a\ - z^o- On the basis 
that all amplitude levels are equally likely a priori, the average probability of a 
symbol error is simply the probability that the noise variable n exceeds in 
magnitude one-half of the distance between levels. However, when either one 
of the two outside levels ±{M - 1} is transmitted, an error can occur in one 
direction only. Thus, we have 


Pm = ,, P(\r ~sJ>dV\¥ g ) 


M 

M-\ 2 


M 

V/r% . 

M- 1 


M 

V2tt J ' 

2 (M- 

—q( 


e' 2,N »dx 


e~' 2:2 dx 


M 



(5-2-42) 


The error probability in (5-2-42) can also be expressed in terms of the average 
transmitted power. From (5-2-40), we note that 





(5-2-43) 


By substituting for d 2 % g in (5-2-42), we obtain the average probability of a 
symbol error for PAM in terms of the average power as 


or, equivalently, 


Pm — 


2(M - 1) 
M 




Pm ~ 


2 (M - 1) 
M 


Q { 


6%, v > 
(M 2 -\)N 0 / 


(5-2-44) 


(5-2-45) 


where £ av = P iv T is the average energy. 

In plotting the probability of a symbol error for Af-ary signals such as M - ary 
PAM, it is customary to use the SNR per bit as the basic parameter. Since 
T = kT h and k = log 2 M, (5-2-45) may be expressed as 


P* 


_ 2(M-1) / / (6log 2 M)^ av \ 

M V (M 2 -l)N 0 ) 


(5-2-46) 


where % ha , = P„T b is the average bit energy and % hav /j V 0 is the average SNR 
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FIGURE 5-2-8 Probability of a symbol error for PAM. SNR per bit.y„(dB) 


per bit. Figure 5-2-8 illustrates the probability of a symbol error as a function 
of 101og 10 ^ft av /A r o. with Af as a parameter. Note , that the case Af=2 
corresponds to the error probability for binary antipodal signals. Also observe 
that the SNR per bit increases by over 4 dB for every factor-of-two increase in 
M. For large M, the additional SNR per bit required to increase Af by a factor 
of two approaches 6 dB. 


-2-7 Probability of Error For M-ary PSK 

Recall from Section 4-3 that digital phase-modulated signal waveforms may be 
expressed as 


Sm(t) = g(t) COS 


„ , 2n , 

2 */c' + — (m 



Osfssr (5-2-47) 


and have the vector representation 


s 


m 


^cos^(m-l) V¥ s sin ^ 

M M 


(m-1) 


(5-2-48) 


where % = is the energy in each of the waveforms and g(t) is the pulse 
shape of the transmitted signal. Since the signal waveforms have equal energy, 
the optimum detector for the AWGN channel given by (5-1-44) computes the 
correlation metrics 

C{ r,s„,) = r-s m , m ~ 1, 2, M (5-2-49) 

In other words, the received signal vector r = [r, r 2 ] is projected onto each of 
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the M possible signal vectors and a decision is made in favor of the signal with 
the largest projection. 

The correlation detector described above is equivalent to a phase detector 
that computes the phase of the received signal from r and selects the signal 
vector s,„ whose phase is closest to r. Since the phase of r is 


0, = tan 


(5-2-50) 


we will determine the pdf of © r , from which we shall compute the probability 
of error. 

Let us consider the case in which the transmitted signal phase is Q r = 0, 
corresponding to the signal s,(f). Hence, the transmitted signal vector is 


5( , = {V^ 0] 

and the received signal vector has components 

r, - VW S 4 - n i 

r 2 ~ n 2 


(5-2-51) 


(5-2-52) 


Because n, and n 2 are jointly gaussian random variables, it follows that r, 
and r 2 are jointly gaussian random variables with £(/■,) -V%, E(r 2 ) = 0, and 
ar\ - crj 2 = 2 N a = cr 2 r . Consequently, 


Pr(r u r 2 ) = -^exp ■ 
llUT r L 


(r.-Vj^ + r* 

2<tI 


(5-2-53) 


The pdf of the phase Q r is obtained by a change in variables from (r it r 2 ) to 


V = Vrf+~r 2 
0, = tan 1 (r 2 /r t ) 


(5-2-54) 


This yields the joint pdf 


Py.bIY, © r ) = ^z ex P 


V 2 +%- 2V% V cos 0A 

2u l r I 


Integration of p v&r (V, 0 r ) over the range of V yields p 0r (0,). That is, 


/>e,(6,)= f Pv.eAV,e,)dV 
J o 

= — e ~ 2l> ‘ sm ‘ ®' r Ve 

2* J 0 


(V-V4y,cose,) 2 /2 


(5-2-55) 


i 

( 
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■FIGURE 5-2-9 



for y, = 1,2,4 and 10. e. 


where for convenience, we have defined the symbol SNR as y, = Figure 

5-2-9 illustrates /o,(0 r ) for several values of the SNR parameter y, when the 
transmitted phase is zero. Note that /©(0 r ) becomes narrower and more 
peaked about 0 r = 0 as the SNR y, increases. 

When s,(r) is transmitted, a decision error is made if the noise causes the 
phase to fall outside the range -n/M =£0 r « nlM . Hence, the probability of a 
symbol error is 


rir/M 

Pm- 1 - p*xe t )de, (5-2-56) 

J - ntM 

In general, the integral of p 0 (©) does not reduce to a simple form and must be 
evaluated numerically, except for M = 2 and M - 4. 

For binary phase modulation, the two signals .r ( (r) and s 2 (() are antipodal, 
and, hence, the error probability is 


P 2 = 



(5-2-57) 


When M — 4, we have in effect two binary phase-modulation signals in phase 
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quadrature. Since there is no crosstalk or interference between the signals on 
the two quadrature carriers, the bit error probability is identical to that in 
(5-2-57). On the other hand. • the symbol error probability for M =4 is 
determined by noting that 


r, / 

/ 24\1 


vaJJ 


(5-2-58) 


where P, is the probability of a correct decision for the 2-bit symbol. The result 
(5-2-58) follows from the statistical independence of the noise on the 
quadrature carriers. Therefore, the symbol error probability for M = 4 is 


P, = 1 - P c 


2 Q( 


2t,M 


V No 


1 ~iQ 


24 T 

H, i 


(5-2-59) 


For M > 4, the symbol error probability P M is obtained by numerically 
integrating (5-2-55). Figure 5-2-10 illustrates this error probability as a function 
of the SNR per bit for M = 2. 4, 8, 16, and 32. The graphs clearly illustrate the 
penalty in SNR per bit as M increases beyond M = 4. For example, at 
P % , = 10 the difference between M = 4 and M = 8 is approximately 4 dB, and 
the difference between M = 8 and M = 16 is approximately 5dB. For large 
values of M, doubling the number of phases requires an additional 6dB/bil to 
achieve the same performance. 

An approximation to the error probability for large values of M and for 



FIGURE 5-2-10 Probability of a symbol error for PSK signals. 
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large SNR may be obtained by first approximating p o (0). For %JN {t » 1 and 
|0,| 2 K, p 0 ,(©r) is well approximated as 


P9 r (0r) 


I— cos Q r e~ 2y ‘ 5,1,2 

n 


(5-2-60) 


By substituting for p 0r (0,) in (5-2-56) and performing the change in variable 
from 0 r to u - V2%sin0 r , we find that 



cos 0,e 2y ' 511,2 dQ r 


- 2 Q (^ sin = 2 q(' V2k^ h sin 


(5-2-61) 


where k = log 2 M and 7, = ky h . Note that this approximation to the error 
probability is good for all values of M. For example, when Af = 2 and M = 4, 
we have P 2 = P 4 = 2(7(V2%), which compares favorably (a factor-of-two 
difference) with the exact probability given by (5-2-57). 

The equivalent bit error probability for A/-ary PSK is rather tedious to 
derive due to its dependence on the mapping of k - bit symbols into the 
corresponding signal phases. When a Gray code is used in the mapping, two 
k - bit symbols corresponding to adjacent signal phases differ in only a single bit. 
Since the most probable errors due to noise result in the erroneous selection of 
an adjacent phase to the true phase, most &-bit symbol errors contain only a 
single-bit error. Hence, the equivalent bit error probability for Af-ary PSK is 
well approximated as 



Our treatment of the demodulation of PSK signals assumed that the 
demodulator had a perfect estimate of the carrier phase available. In practice, 
however, the carrier phase is extracted from the received signal by performing 
some nonlinear operation that introduces a phase ambiguity. For example, in 
binary PSK, the signal is often squared in order to remove the modulation, and 
the double-frequency component that is generated is filtered and divided by 2 
in frequency in order to extract an estimate of the carrier frequency and phase 
4>. These operations result in a phase ambiguity of 180° in the carrier phase. 
Similarly, in four-phase PSK, the received signal is raised to the fourth power 
in order to remove the digital modulation, and the resulting fourth harmonic of 
the carrier frequency is filtered and divided by 4 in order to extract the carrier 
component. These operations yield a carrier frequency component containing 
the estimate of the carrier phase <f>, but there are phase ambiguities of ±90° 
and 180° in the phase estimate. Consequently, we do not have an absolute 
estimate of the carrier phase for demodulation. 
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The phase ambiguity problem resulting from the estimation of the carrier 
phase 4> can be overcome by encoding the information in phase differences 
between successive signal transmissions as opposed to absolute phase encod- 
ing. For example, in binary PSK, the information bit 1 may be transmitted by 
shifting the phase of the carrier by 180° relative to the previous carrier phase, 
while the information bit 0 is transmitted by a zero phase shift relative to the 
phase in the previous signaling interval. In four-phase PSK, the relative phase 
shifts between successive intervals are 0, 90°, 180°, and -90°, corresponding to 
the information bits 00, 01, 11, and 10, respectively. The generalization to 
M >4 phases is straightforward. The PSK signals resulting from the encoding 
process are said to be differentially encoded. The encoding is performed by a 
relatively simple logic circuit preceding the modulator. 

Demodulation of the differentially encoded PSK signal is performed as 
described above, by ignoring the phase ambiguities. Thus, the received signal is 
demodulated and detected to one of the M possible transmitted phases in each 
signaling interval. Following the detector is a relatively simple phase com- 
parator that compares the phases of the demodulated signal over two 
consecutive intervals in order to extract the information. 

Coherent demodulation of differently encoded PSK results in a higher 
probability of error than the error probability derived for absolute phase 
encoding. With differentially encoded PSK, an error in the demodulated phase 
of the signal in any given interval will usually result in decoding errors over 
two consecutive signaling intervals. This is especially the case for error 
probabilities below 0.1. Therefore, the probability of error in differentially 
encoded A/ -ary PSK is approximately twice the probability of error for M -ary 
PSK with absolute phase encoding. However, this factor -of-two increase in the 
error probability translates into a relatively small loss in SNR. 

5-2-8 Differential PSK (DPSK) and its Performance 

A differentially encoded phase-modulated signal also allows another type of 
demodulation that does not require the estimation of the carrier phase.! 
Instead, the received signal in any given signaling interval is compared to the 
phase of the received signal from the preceding signaling interval. To 
elaborate, suppose that we demodulate the differentially encoded signal by 
multiplying r(t ) by cos 2rtf c t and sin 2rrf c i and integrating the two products over 
the interval T. At the klh signaling interval, the demodulator output is 

r* = [V% cos (0* - <f>) + n k , yf% sin (0* - <f>) + n k2 ] 
or, equivalently, 

r k = VW^-^ + n k (5-2-63) 


t Because no phase estimation is required, DPSK is often considered to be a noncoherent 
communication technique. We take the view that DPSK represents a form of digital phase 
modulation in the extreme case where the phase estimate is derived only front the previous symbol 
interval. 
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FIGURE 5-211 


where 9 k is the phase angle of the transmitted signal at the k th signaling 
interval, is the carrier phase, and n k = n ki + is the noise vector. Similarly, 
the received signal vector at the output of the demodulator in the preceding 
signaling interval is 

r k -\ = \%e’ ie ‘-' ■* ) + n k , l (5-2-64) 

The decision variable for the phase detector is the phase difference between 
these two complex numbers. Equivalently, we can project r k onto r k .. , and use 
the phase of the resulting complex number; that is, 

r k rt- x = V% l e ,i9k 4 - +n k nt , (5-2-6 5) 

which, in the absence of noise, yields the phase difference 6 k - Thus, the 
mean value of r k rt- x is independent of the carrier phase. Differentially 
encoded PSK signaling that is demodulated and detected as described above is 
called differential PSK (DPSK). 

The demodulation and detection of DSPK using matched filters is illustrated 
in Figure 5-2-11. If the pulse g(t) is rectangular, the matched filters may be 
replaced by integrate-and-dump filters. 

Let us now consider the evaluation of the error probability performance of a 
DPSK demodulator and detector. The derivation of the exact value of the 
probability of error for M-ary DPSK is extremely difficult, except for M = 2. 
The major difficulty is encountered in the determination of the pdf for the 
phase of the random variable r k r*- lt given by (5-2-65). However, an 
approximation to the performance of DPSK is easily obtained, as we now 
demonstrate. 

Without loss of generality, suppose the phase difference 6 k - 9 k _ x = 0. 
Furthermore, the exponential factors and £»><** rf,) j n (5-2-65) can be 

absorbed into the gaussian noise components n*_, and n k , without changing 
their statistical properties. Therefore, in (5-2-65) can be expressed as 

r k rl-x = % + V% s (n k +/!?_,) + n k n* k . x (5-2-66) 

The complication in determining the pdf of the phase is the term n k n*-. } . 
However, at SNRs of practial interest, the term is small relative to the 

dominant noise term V%(n k + nt-i). If we neglect the term and we 


Block diagram of DPSK demodulator. 



Output 

decision 
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also normalize r k rt-\ by dividing through by V%, the new set of decision 
metrics becomes 


x = + Re (n* + 

y = Im ( n k + «i-i) 


(5-2-67) 


The variables x and y are uncorrelated gaussian random variables with 
identical variances <j 2 n = N 0 . The phase is 


0 r = tan 1 - (5-2-68) 

x 

At this stage, we have a problem that is identical to the one we solved 
previously for phase-coherent demodulation. The only difference is that the 
noise variance is now twice as large as in the case of PSK. Thus we conclude 
that the performance of DPSK is 3 dB poorer than that for PSK. This result is 
relatively good for Ms* 4, but it is pessimistic for M = 2 in the sense that the 
loss in binary DPSK relative to binary PSK is less than 3 db at large SNR. This 
is demonstrated below. 

In binary DPSK, the two possible transmitted phase differences are 0 and 
trrad. As a consequence, only the real part of r*r*_, is needed for recovering 
the information. Using (5-2-67), we express the real part as 

Re ('*'•*—!) = 2(r*r*L, +rjr t _,) 

Because the phase difference between the two successive signaling intervals is 
zero, an error is made if Re (r* /•£_,) <0. The probability that r*r?_, + r?r*_, < 
0 is a special case of a derivation, given in Appendix B concerned with the 
probability that a general quadratic form in complex-valued gaussian random 
variables is less than zero. The general form for this probability is given by 
(B-21) of Appendix B, and it depends entirely on the first and second moments 
of the complex-valued gaussian random variables r k and r*_,. Upon evaluating 
the moments and the parameters that are functions of the moments, we obtain 
the probability of error for binary DPSK in the form 

P b = (5-2-69) 

where C i h /N () is the SNR per bit. 

The graph is shown in Fig, 5-2-12. Also shown in that illustration is the 
probability of error for binary, coherent PSK. We observe that at error 
probabilities of P h 10 J the difference in SNR between binary PSK and 
binary DPSK is less than 3dB. In fact, at P ft *10" 5 , the difference in SNR is 
less than 1 dB. 

The probability of a binary digit error for four-phase DPSK with Gray 
coding can be expressed in terms of well-known functions, but its derivation is 
quite involved. We simply state the result at this point and refer the interested 
reader to Appendix C for the details of derivation. It is expressed in the form 

~ Q\(a, b) - %{ab)ex p [-((a : + b 2 )] 


(5-2-70) 
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FIGURE 5-2-12 


FIGURE 5-2-13 



Probability of error for binary PSK and DPSK. SNR per bit. YOdB) 


where Q x (a,b) is the Markum Q function defined by (2-1-122) and (2-1-123), 
/<>(*) is the modified Bessel function of order zero, defined by (2-1-120), and 
the parameters a and b are defined as 


a = V 2 t»(1 -VI) 
b = V2y fr (l-+ Vf) 


(5-2-71) 


Figure 5-2-13 illustrates the probability of a binary digit error for two- and 


Probability of bit error for binary and forji-phase PSK 
and DPSK. 
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four-phase DPSK and coherent PSK signaling obtained from evaluating the 
exact formulas derived in this section. Since binary DPSK is only slightly 
inferior to binary PSK at large SNR, and DPSK does not require an elaborate 
method for estimating the carrier phase, it is often used in digital communica- 
tions systems. On the other hand, four-phase DPSK is approximately 2.3 dB 
poorer in performance than four-phase PSK at large SNR. Consequently the 
choice *between these two four-phase systems is not as clear cut. One must 
weigh the 2.3 dB loss against the reduction in implementation complexity. 

5-2-9 Probability of Error for QAM 

Recall from Section 4-3 that QAM signal waveforms may be expressed as 

s„i(0 = A mi g(t) cos 27tfj - A ms g(t) sin litfj, 0 (5-2-72) 

where A mc and A, m are the information-bearing signal amplitudes of the 
quadrature carriers and g(f) is the signal pulse. The vector representation of 
these waveforms is 

s„, = [A„„V^ A„»VW X 1 (5-2-73) 

To determine the probability of error for QAM, we must specify the signal 
point constellation. We begin with QAM signal sets that have M = 4 points. 
Figure 5-2-14 illustrates two four-point signal sets. The first is a four-phase 
modulated signal and the second is a QAM signal with two amplitude levels, 
labeled A, and A 2 , and four phases. Because the probability of error is 
dominated by the minimum distance between pairs of signal points, let us 
impose the condition that d l ^ n = 2 A for both signal constellations and let us 
evaluate the average transmitter power, based on the premise that all signal 
points are equally probable. For the four-phase signal, we have 

P,v = 5(4)2 A 2 = 2 A 2 (5-2-74) 

For the two-amplitude, four-phase QAM, we place the points on circles of 
radii A and V3A. Thus, d\^„ = 2 A, and 

P av = i|2(3A 2 ) + 2A 2 ] = 2 A 2 (5-2-75) 

which is the same average power as the M = 4-phase signal constellation. 
Hence, for all practical purposes, the error rate performance of the two signal 



FIGURE 5-2-14 Two four-point signal constellations. 
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FIGURE 5-2-15 



(c) (rf) 

Four eight-point QAM signal constellations. 


sets is the same. In other words, there is no advantage of the two-amplitude 
QAM signal set over M = 4-phase modulation. 

Next, let us consider M = 8 QAM. In this case, there are many possible 
signal constellations. We shall consider the four signal constellations shown in 
Fig. 5-2-15, all of which consist of two amplitudes and have a minimum 
distance between signal points of 2A. The coordinates ( A mc , A for each 
signal point, normalized by A, are given in the figure. Assuming that the signal 
points are equally probable, the average transmitted signal power is 

^ = jz 2 (^Ic + A 2 ns ) 

M m = I 

A 2 M 

= 77 2 (“lc + a 2 mc ) ( 5 - 2 - 76 ) 

M m = i 

where (a mc , a ms ) are the coordinates of the signal points, normalized by A. 

The two signal sets (a) and (c) in Fig. 5-2-15 contain signal points that fall 
on a rectangular grid and 'have F av = 6 A 2 . The signal set (b) requires an average 
transmitted power F av = 6.83A 2 , and (d) requires P av = 4.73A 2 . Therefore, the 
fourth signal set requires approximately 1 dB less power than the first two and 
1.6 dB less power than the third to achieve the same probability of error. This 
signal constellation is known to be the best eight-point QAM constellation 
because it requires the least power for a given minimum distance between 
signal points. 

For M s* 16, there are many more possibilities for selecting the QAM signal 
points in the two-dimensional space. For example, we may choose a circular - 
multiamplitude constellation for M = 16, as shown in Fig. 4-3-4. In this case, 
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the signal points at a given amplitude level are phase-rotated by relative to 
the signal points at adjacent amplitude levels. This 16-QAM constellation is a 
generalization of the optimum 8-QAM constellation. However, the circular 
16-QAM constallation is not the best 16-point QAM signal constellation for 
the AWGN channel. 

Rectangular QAM signal constellations have the distinct advantage of being 
easily generated as two PAM signals impressed on phase-quadrature carriers. 
In addition, they are easily demodulated. Although they are not the best M- ary 
QAM signal constellations for At s 16, the average transmitted power required 
to achieve a given minimum distance is only slightly greater than the average 
power required for the best M - ary QAM signal constellation. For these 
reasons, rectangular M - ary QAM signals are most frequently used in practice. 

For rectangular signal constellations in which M - 2 k , where k is even, the 
QAM signal constellation is equivalent to two PAM signals on quadrature 
carriers, each having VM = 2 k '~ sienal points. Since the signals in the 
phase -quadratuie components can be perfectly separated at the demodulator, 
the probability of error for QAM is easily determined from the probability of 
error for PAM. Specifically, the probability of a correct decision for the M - ary 
QAM system is 

P t = (1 - PvxY (5-2-77) 

where /Vm is the probability of error of a Vm - ary PAM with one-half the 
average power in each quadrature signal of the equivalent QAM system. By 
appropriately modifying the probability of error for A/rary PAM, we obtain 

where %JN it is the average SNR per symbol. Therefore, the probability of a 
symbol error for the Af-ary QAM is 

P w = 1 -(1 - Pv«) 2 (5-2-79) 

Note that this result is exact for M = 2 k when k is even. On the other hand, 
when k is odd, there is no equivalent Vm - ary PAM system. This is no 
problem, however, because it is rather easy to determine the error rate for a 
rectangular signal set. If we employ the optimum detector that bases its 
decisions on the optimum distance metrics given by (5-1-43), it is relatively 
straightforward to show that the symbol error probability is tightly upper- 
bounded as 



(5-2-80) 
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FIGURE 5-2-16 


Probability of a symbol error for QAM. 
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for any k 2* 1, where t hav /N 0 is the average SNR per bit. The probability of a 
symbol error is plotted in Fig. 5-2-16 as a function of the average SNR per bit. 

For non-rectangular QAM signal constellations, we may upper-bound the 
error probability by use of a union bound. An obvious upper bound is 

/ 5 M <(JW-l)0(V[^> n ] 2 /2N o ) 

where in the minimum euclidean distance between signal points. This 
bound may be loose when M is large. In such a case, we may approximate P u 
by replacing M - 1 by M n , where M„ is the largest number of neighboring 
points that are at distance „ from any constellation point. 

It is interesting to compare the performance of QAM with that of PSK for 
any given signal size M, since both types of signals are two-dimensional. Recall 
that for M - ary PSK, the probability of a symbol error is approximated as 

Pm % 2£>(V2y^sin^) (5-2-81) 


where y, is the SNR per symbol. For Af-ary QAM, we may use the expression 
(5-2-78). Since the error probability is dominated by the argument of the Q 
function, we may simply compare the arguments of Q for the two signal 
formats. Thus, the ratio of these two arguments is 


3/(Af - 1) 
2 sin 2 ( k/M ) 


(5-2-82) 


For example, when M = 4, we have = 1. Hence, 4-PSK and 4-QAM yield 
comparable performance for the same SNR per symbol. On the other hand, 
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TABLE 5-2-1 SNR ADVANTAGE OF A/- ARY 
QAM OVER M ARY PSK 



M 

io tog,* 


8 

1.65 


16 

4.20 


32 

7.02 


64 

9.95 


when M> 4 we find that so that M - ary QAM yields better 

performance than M - ary PSK. Table 5-2-1 illustrates the SNR advantage of 
QAM over PSK for several values of M. For example, we observe that 
32-QAM has a 7 dB SNR advantage over 32-PSK. 


5-2-10 Comparison of Digital Modulation Methods 

The digital modulation methods described in this chapter can be compared in a 
number of ways. For example, one can compare them on the basis of the SNR 
required to achieve a specified probability of error. However, such a 
comparison would not be very meaningful, unless it were made on the basis of 
some constraint, such as a fixed data rate of transmission or, equivalently, on 
the basis of a fixed bandwidth. With this goal in mind, let us consider the 
bandwidth requirements for several modulation methods. 

For multiphase signals, the channel bandwidth required is simply the 
bandwidth of the equivalent lowpass signal pulse g(r), which depends on its 
detailed characteristics. For our purposes, we assume that g(t) is a pulse of 
duration T and that its bandwidth W is approximately equal to the reciprocal 
of T. Thus, W = 1/7 and, since T — k/R = (log 2 M)/ R, it follows that 


W = 


R 

log 2 M 


(5-2-83) 


Therefore, as M is increased, the channel bandwidth required, when the bit 
rate R is fixed, decreases. The bandwidth efficiency is measured by the bit rate 
to bandwidth ratio, which is 


R 

W 


- log 2 M 


(5-2-84) 


The bandwidth-efficient method for transmitting PAM is single-sideband. 
Then, the channel bandwidth required to transmit the signal is approximately 
equal to 1/2T and, since T = k/R = (Jog 2 M)/R, it follows that 

^ = 2!og 2 A/ 


(5-2-85) 
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This is a factor of two better than PSK. 

In the case of QAM, we have two orthogonal carriers, with each carrier 
having a PAM signal. Thus, we double the rate relative to PAM However, the 
QAM signal must be transmitted via double sideband. Consequently, QAM 
and PAM have the same bandwidth efficiency when the bandwidth is 
referenced to the bandpass signal. 

Orthogonal signals have totally different bandwidth requirements. If the 
M ~ 2 k orthogonal signals are constructed by means of orthogonal carriers with 
minimum frequency separation of 1/27" for orthogonality, the bandwidth 
required for transmission of k = log? M information bits is 


W = 


M M M 

2T~ 2(k/R) ~ 2 log : M ^ 


(5-2-86) 


In this case, the bandwidth increases as M increases. Similar relationships 
obtain for simplex and biorthogonal signals. In the case of biothogonal signals, 
the required bandwidth is one half of that for orthogonal signals. 

A compact and meaningful comparison of these modulation methods is one 
based on the normalized data rate R/W (bits per second per hertz of 
bandwidth) versus the SNR per bit { c S b /N a ) required to achieve a given error 
probability. Figure 5-2-1 7 illustrates the graph of R/W versus SNR per bit for 
PAM, QAM, PSK, and orthogonal signals, for the case in which the error 
probability js P M - 10 We observe that in the case of PAM, QAM, and PSK, 
increasing M results in a higher bit rate-to-bandwidth ratio R/W. However, the 
cost of achieving the higher data rate is an increase in the SNR per bit. 
Consequently, these modulation methods are appropriate for communication 
channels that are bandwidth limited, where we desire a bit rate-to-bandwidth 
ratio R/W > 1 and where there is sufficiently high SNR to support increases in 
M. Telephone channels and digital microwave radio channels are examples of 
such bandlimited channels. 

In contrast, M - ary orthogonal signals yield a bit rate-to-bandwidth ratio of 
R/W « 1. As M increases, R/W decreases due to an increase in the required 
channel bandwidth. However, the SNR per bit required ‘to achieve a given 
error probability (in this case, P M = I0 -s ) decreases as M increases. Conse- 
quently, A/~ary orthogonal signals are appropriate for power-limited channels 
that have sufficiently large bandwidth to accommodate a large number of 
signals. In this case, as M-+*, the error probability can be made as small 
as desired, provided that %„/N o >0 . 693 (-1.6dB). This is the minimum SNR 
per bit required to achieve reliable transmission in the limit as the 
channel bandwidth W —* x and the corresponding bit rate-to-bandwidth ratio 
R/w^a. 

Also shown in Fig. 5-2-17 is the graph for the normalized capacity of the 
bandlimited AWGN channel, which is due to Shannon (1948). The ratio C/W, 
where C R) is the capacity in bits/s, represents the highest achievable bit 
rate-to-bandwidth ratio on this channel. Hence, it serves as the upper bound 
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FIGURE 5-2-17 Comparison of several modulation methods at 10 ‘ symbol error probability. 


on the bandwidth efficiency of any type of modulation. This bound is derived 
in Chapter 7 and discussed in greater detail there. 


5-3 OPTIMUM RECEIVER FOR CPM SIGNALS 

We recall from Section 4-3 that CPM is a modulation method with memory. 
The memory results from the continuity of the transmitted carrier phase from 
one signal interval to the next. The transmitted CPM signal may be expressed 
as 


s (0 = 


—cos [2 nf c t + <f>(r;I)] 


(5-3-1) 
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where d>(f;I) is the carrier phase. The filtered received signal for an additive 
gaussian noise channel is 


where 


r(t)=s(t) + n(t) 


(5-3-2) 


n(r) = n c (t) cos 2nf t - n,(t) sin 2nf c t (5-3-3) 


5-3-1 Optimum Demodulation and Detection of CPM 

The optimum receiver for this signal consists of a correlator followed by a 
maximum-likelihood sequence detector that searches the paths through the 
state trellis for the minimum euclidean distance path. The Viterbi algorithm is 
an efficient method for performing this search. Let us establish the general 
state trellis structure for CPM and then describe the metric computations. 

Recall that the carrier phase for a CPM signal with a fixed modulation index 
h may be expressed as 

4>(f, I) = 2nh 2 hq(t-kT) 

*= - * 

n - L n 

= tth X lk + 2nh X I k q(t-kT) 

k=-x k=n~L+l 

= 9„ + 9(t; I), nT^t^(n + 1)7 (5-3-4) 

where we have assumed that q(t) = 0 for t < 0, q(t) = \ for t 5= LT, and 

q(0 ^ ( g(r) dr (5-3-5) 

*'o 

The signal pulse g(r) = 0 for f<0 and 1 3= LT. For L = 1, we have a full 
response CPM, and for L> 1, where L is a positive integer, we have a partial 
response CPM signal. 

Now, when h is rational, i.e., h = m/p where m and p are relatively prime 
positive integers, the CPM scheme can be represented by a trellis. In this case, 
there are p phase states 


nm 2 nm (p-l)nm) 

> f * • • j I 

P P P > 

when m is even, and 2 p phase states 

Tcm (2p - l)frm 1 
P P > 




( 5 - 3 - 6 ) 


( 5 - 3 - 7 ) 


when m is odd. If L= 1, these are the only states in the trellis. On the other 
hand, if L > 1, we have an additional number of states due to the partial 
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response character of the signal pulse g(t). These additional states can be 
identified by expressing 6(t, I) given by (5-3-4) as 

9(t\ I) = 2nh X hq(t~ kT) + 2nhl n q(t - nT) (5-3-8) 

k=n-L + i 


The first term on the right-hand side of (5-3-8) depends on the information 
symbols (/„~i, I n - 2 , ■ ■ - , 4-u+i). which is called the correlative state vector, 
and represents the phase term corresponding to signal pulses that have not 
reached their final value. The second term in (5-3-8) represents the phase 
contribution due to the most recent symbol /„. Hence, the state of the CPM 
signal (or the modulator) at time t = nT may be expressed as the combined 
phase state and correlative state, denoted as 


Sf! — {®n» ln-\t In- 2> • ■ ■ > ln-L + l) 


(5-3-9) 


for a partial response signal pulse of length LT, where L > 1. In this case, the 
number of states is 


K = 


pM L 1 
2pM L ~ l 


(even m) 
(oddm) 


(5-3-10) 


when h = m/p. 

Now, suppose the state of the modulator at t = nT is S n . The effect of the 
new symbol in the time interval nT ^ t (n + \)T is to change the state from 
S n to 5„ +1 . Hence, at t = (n + 1)7) the state becomes 


where 


^/i+l (^rt+]i lfii + • • * j ^n — L + 2) 


0„ +1 = 0 „ + i 


Example 5-3*1 

Consider a binary CPM scheme with a modulation index h — 3/4 and a 
partial response pulse with L = 2. Let us determine the states S n of the CPM 
scheme and sketch the phase tree and state trellis. 

First, we note that there are 2p = 8 phase states, namely, 

© s = {0, ±iff, ±itl, ±^7T, It] 


For each of these phase states, there are two states that result from the 
memory of the CPM scheme. Hence, the total number of states is N s = 16, 
namely, 

(0, 1), (0, -1), {n, 1), (n, -1), (i*, 1), (U, -1), (\n, 1), (i*, -1), 
(1^,1), (1^,-1), 1), (~\n, -1), (-i*; 1), -1), 

(-!*, 1 ), (-<*, - 1 ) 
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FIGURE 5-3-1 


State trellis for partial response (L - 2) CPM 
with h = \. 


<«..Vi> 





If the system is in phase state 6„ = and - 1 then 

0 „+ 1 = 9„ + 

= -in- 4«= ~n 

The state trellis is illustrated in Fig. 5-3-1. A path through the state trellis 
corresponding to the sequence (1, -1, -1, -1, 1, 1) is illustrated in Fig. 
5-3-2. 

In order to sketch the phase tree, we must know the signal pulse shape 
g(t). Figure 5-3-3 illustrates the phase tree when g(t) is a rectangular pulse 
of duration IT, with initial state (0,1). 

Having established the state trellis representation of CPM, let us now 
consider the metric computations performed in the Viterbi algorithm. 

Metric Computations By referring back to the mathematical development 
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FIGURE 5-3-2 


FIGURE 5-3-3 


A single signal path through the 
trellis. 



Phase tree for L = 2 partial response CPM 
with h = 


<t>U'. I) 
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for the derivation of the maximum likelihood demodulator given in Section 
5-1-4, it is easy to show that the logarithm of the probability of the observed 
signal r(t) conditioned on a particular sequence of transmitted symbols 1 is 
proportional to the cross-correlation metric 


CM n ( I) = I r(t) cos [ai, f + 4>(t; I)] dt 

J — -x. 

Hn + \)T 

= CA/„_,(I)+ r(Ocos[(u f f + 0(f;I) + 6„}dt (5-3-11) 

JnT 


The term CAf„^,(I) represents the metrics for the surviving sequences up to 
time nT, and the term 


J p(n+1)T 

r(t) cos [a> c t + 6(1 ; I) + 6„] dt (5-3-12) 

nT 

represents the additonal increments to the metrics contributed by the signal in 
the time interval nl - «/«(n + l)7’. Note that there are M L possible sequences 
1 - (L, I„-\, ■ ■ ■ , I„~l+ i) of symbols and p (or 2 p) possible phase states {£„}. 
Therefore, there are pM L (or 2pM L ) different values of u„(I, 9„), computed in 
each signal interval, and each value is used to increment the metrics 
corresponding to the pM L ~' surviving sequences from the previous signaling 
interval. A general block diagram that illustrates the computations of i/„(I; 6 n ) 
for the Viterbi decoder is shown in Fig. 5-3-4. 

Note that the number of surviving sequences at each state of the Viterbi 
decoding process \spM L 1 (or 2pM L X ). For each surviving sequence, we have 
M new increments of v„(I; 0„) that are added to the existing metrics to yield 
pM‘ (or 2 pM' ) sequences with pM L (or 2pM L ) metrics. However, this number 
is then reduced back to pM L ~ 1 (or 2pM L ~ *) survivors with corresponding 
metrics by selecting the most probable sequence of the M sequences merging 
at each node of the trellis and discarding the other M - 1 sequences. 


FIGURE 5-3-4 


Compulation of metric increments 
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5-3-2 Performance of CPM Signals 

In evaluating the performance of CPM signals achieved with MLSE, we must 
determine the minimum euclidean distance of paths through the trellis that 
separate at the node at / -0 and re-emerge at a later time at the same node. 
The distance between two paths through the trellis is related to the 
corresponding signals as we now demonstrate. 

Suppose that we have two signals s,(r) and Sy(f) corresponding to two phase 
trajectories $(/:I ( ) and <f>(t;i y ). The sequences I; and I, must be different in 
their first symbol. Then, the euclidean distance between the two signals over an 
interval of length NT, where 1 IT is the symbol rate, is defined as 

d l=\ [sit) -*,<!)? dt 

J rST rNT rNT 

sf(e) dt + I s;({) dt - 2 s,(t)s,(t) dt 

o J 0 h 

2% /*A IT 

= 2N% -2—1 cos [io c t + 4>{t\ I,)] cos \ioj + 4>(t; I )J dt 
* Jo 

2 % f /vr 

= 2N%-~ J cos [<f>(t; I.) - *(/; l y )J dt 
2% f Nr f 

= Y l cos *') “ W dt (5-3- 1 3) 

Hence the euclidean distance is related to the phase difference between the 
paths in the state trellis according to (5-3-13). 

It is desirable to express the distance d] y in terms of the bit energy. Since 
<£- log? Af, (5-3-13) may be expressed as 

dl = 2%fil (5-3-14) 

where <5j* is defined as 

~ log 2 M [ NT f 

8 o = — j J {1 ~ cos [4>(t; I,) - 4,{t\ I/)]} dt (5-3-15) 

Furthermore, we observe that <£(/; I,) - <^»(r; l y ) = 1^ - I y ), so that, with 

€ = I< - ly, (5-3-15) may be written as 

2 10g 2 M f" T r, 

8 l = [1 - COS 4>(f; |)j dt (5-3-16) 

* h 

where any element of | can take the values 0, ±2, ±4, ±...±2(M-l), 
except that * 0. 
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The error rate performance for CPM is dominated by the term correspond- 
ing to the minimum euclidean distance, and it may be expressed as 


where 



si, 


) 


(5-3-17) 


= lim min 5" 

V-* i.i 

(log, M( sr , , 1 

= lim min — ; ( 1 - cos d>(t: I, — !■)] dt [ (5-3-18) 

v — « ij l T J» 1 


We note that for conventional binary PSK with no memory. N = 1 and 
= 5), = 2. Hence, (5-3-17) agrees with our previous result. 

Since 8; nm characterizes the performance of CPM with MLSE, we can 
investigate the effect on <5„ im resulting from varying the alphabet size AA the 
modulation index h, and the length of the transmitted pulse in partial response 
CPM. 

First, we consider full response (L = l) CPM. If we take M - 2 as a 
beginning, we note that the sequences 


I, = +1 , - 1 , A 
1, = -1. +1. A. A 


(5-3-19) 


which differ for k = 0, 1 and agree for k >2. result in two phase trajectories 
that merge after the second symbol This corresponds to the difference 
sequence 

| = {2. -2.0.0....} (5-3-20) 


The euclidean distance for this sequence is easily calculated from (5-3-16). and 
provides an upper bound on 8i m . This upper bound for M = 2 is 


,, , / sin 2zr/r\ 

d~n(h ) = 2^ 1 2nh~ )’ M = 1 (5-3-21) 

For example, where h = 1, which corresponds to MSK, we have c/j,(i) = 2. so 
that 5n lin ( 5 ) 2. 

For M >2 and full response CPM. it is also easily seen that phase 
trajectories merge at t = 2T. Hence, an upper bound on 8i,„ can be obtained 
by considering the phase difference sequence § = {a, -a, 0 . 0 , . . . } where 
a = ±2, ±4, .... ±2(M - 1 ). This sequence yields the upper bound 

<iUh) = min j (2 log : M )( 1 - )] 

i 1 V 2 knh I 


(5-3-22) 
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FIGURE 5-3-5 


FIGURE 5-3-6 



Ltd. Reprinted with permission of the publisher.] 


h 


The graphs of d 2 ff (h) versus h for M - 2, 4, 8, 16 are shown in Fig. 5-3-5. It is 
apparent from these graphs that large gains in performance can be achieved by 
increasing the alphabet size M. It must be remembered, however, that 
Smi r{h)**d 2 B (h). That is, the upper bound may not be achievable for all 
values of h. 

The minimum euclidean distance 8 2 mir< {h) has been determined, by evaluat- 
ing (5-3-16), for a variety of CPM signals by Aulin and Sundberg (1981). For 
example. Fig. 5-3-6 illustrates the dependence of the euclidean distance for 
binary CPFSK as a function of the modulation index h, with the number N of 


Squared minimum euclidean distance as a function of the 
modulation index for binary CPFSK. The upper 
bound is d~ B . [From Aulin and Sundberg (1981 ) 

©198} IEEE.]. 






CHAPTER 5: OPTIMUM RECEIVERS FOR THE ADDITIVE WHITE GAUSSIAN NOISE CHANNEL 293 


bit observation (decision) intervals (N - 1, 2, 3, 4) as a parameter. Also shown 
is the upper bound d 2 g (h) given by (5-3-21). In particular, we note that when 
h = 1, Smm( 2 )- 2 , which is the same squared distance as PSK (binary or 
quaternary) with N = 1. On the other hand, the required observation interval 
for MSK is N = 2 intervals, for which we have §Ln( 2 ) = 2. Hence, the 
performance of MSK with MLSE is comparable to (binary or quaternary) PSK 
as we have previously observed. 

We also note from Fig. 5-3-6 that the optimum modulation index for binary 
CPFSK is /i =0.715 when the observation interval is N = 3. This yields 
6inin(0-7l5) = 2,43, or a gain of 0.85 dB relative to MSK. 

Figure 5-3-7 illustrates the euclidean distance as a function of h for 
M = 4 CPFSK, with the length of the observation interval N as a parameter. 
Also shown (as a dasiied line where it is not reached) is the upper bound d\ 
evaluated from (5-3-22). Note that 5min achieves the upper bound for several 
values of h for some N. In particular, note that the maximum value of d 2 B , 
which occurs at h = 0.9. is approximately reached for N = 8 observed symbol 
intervals. The true maximum is achieved at h = 0.914 with N = 9. For this case, 
^nnn(0.914) = 4.2, which represents a 3.2 dB gain over MSK. Also note that the 
euclidean distance contains minima at h - l, I, etc. These values of h are 
called weak modulation indices and should be avoided. Similar results are 
available for larger values of M, and may be found - in the paper by Aulin and 
Sundberg (1981) and the text by Anderson et al. (1986). 


FIGURE 5-3-7 


Squared minimum euclidean distance as a function of 
the modulation index for quaternary CPFSK. 

The upper hound is </J,. [From Aulin and Sundberg 
(1981). © 1981 IEEE ] 


d-0 1) 
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4dB - 5 
3dB - 4 
2dB - 3 
OdB ■ 2 
-3dB - 1 

FIGURE 5-3-8 Upper bound d\ on the minimum distance for 

partial response (raised cosine pulse) binary CPM. 0 

[ From Sundberg (1986). © 1986 IEEE.) 


Large performance gains can also be achieved with MLSE of CPM by using 
partial response signals. For example, the distance bound d 2 B {h) for partial 
response, raised cosine pulses given by 



[ 1 /l 

2 Kt \ 

(0«/«L7) 


g( 0 = - 

2L7\ 

- cos — — ) 
2 LTf 

(5-3-23) 


[o 


(otherwise) 



is shown in Fig, 5-3-8 for M = 2. Here, note that, as L increases, d 2 B also 
achieves higher values. Clearly, the performance of CPM improves as the 
correlative memory L increases, but h must also be increased in order to 
achieve the larger values of d%. Since a larger modulation index implies a 
larger bandwidth (for fixed L), while a larger memory length L (for fixed h) 
implies a smaller bandwidth, it is better to compare the euclidean distance as a 
function of the normalized bandwidth 2WT b , where W is the 99% power 
bandwidth and T b is the bit interval. Figure 5-3-9 illustrates this type of 
comparison with MSK used as a point of reference (OdB). Note from this 
figure that there are several decibels to be gained by using partial response 
signals and higher signaling alphabets. The major price to be paid for this 
performance gain is the added exponentially increasing complexity in the 
implementation of the Viterbi decoder. 

The performance results shown in Fig. 5-3-9 illustrate that 3-4 dB gain 
relative to MSK can be easily obtained with relatively no increase in bandwidth 
by the use of raised cosine partial response CPM and M = 4. Although these 
results are for raised cosine signal pulses, similar gains can be achieved with 
other partial response pulse shapes. We emphasize that this gain in SNR is 
achieved by introducing memory into the signal modulation and exploiting the 
memory in the demodulation of the signal. No redundancy through coding has 
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FIGURE 5-3-9 


Power bandwidth tradeoff for partial response CPM 
signals with raised cosine pulses. W is the 99 
percent in-band power bandwidth. [From Sundberg 
(1986). ©1986 IEEE ] 



been introduced. In effect, the code has been built into the modulation and the 
trellis-type (Viterbi) decoding exploits the phase constraints in the CPM signal. 

Additional gains in performance can be achieved by introducing additional 
redundancy through coding and increasing the alphabet size as a means of 
maintaining a fixed bandwidth. In particular, trellis-coded CPM using relatively 
simple convolution codes has been thoroughly investigated and many results 
are available in the technical literature. The Viterbi decoder for the convolu- 
tionally encoded CPM signal now exploits the memory inherent in the code 
and in the CPM signal. Performance gains of the order of 4-6 dB, relative to 
uncoded MSK with the same bandwidth, have been demonstrated by combin- 
ing convolutional coding with CPM. Extensive numerical results for coded 
CPM are given by Lindell (1985). 

Multi-/i CPM By varying the modulation index from one signaling interval 
to another, it is possible to increase the minimum euclidean distance S^m 
between pairs of phase trajectories and, thus, improve the performance gain 
over constant-/] CPM. Usually, multi-/] CPM employs a fixed number H of 
modulation indices that are varied cyclically in successive signaling intervals. 
Thus, the phase of the signal varies piecewise linearly. 

Significant gains in SNR are achievable by using only a small number of 
different values of h. For example, with full response (L = 1) CPM and H = 2, 
it is possible to obtain a gain of 3 dB relative to binary or quaternary PSK. By 
increasing H to H = 4, a gain of 4.5 dB relative to PSK can be obtained. The 
performance gain can also be increased with an increase in the signal alphabet. 
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Table 5-3-1 lists the performance gains achieved with M = 2, 4, and 8 for 
several values of H. The upper bounds on the minimum euclidean distance are 
also shown in Fig. 5-3-10 for several values of M and H. Note that the major 
gain in performance is obtained when H is increased from H = 1 to H = 2. For 
H >2, the additional gain is relatively small for small values of {h,}. On the 
other hand, significant performance gains are achieved by increasing the 
alphabet size M. 

The results shown above hold for full response CPM. One can also extend 
the use of muIti-A CPM to partial response in an attempt to further improve 
performance. It is anticipated that such schemes will yield some additional 
performance gains, but numerical results on partial response, multi-/? CPM are 
limited. The interested reader is referred to the paper by Aulin and Sundberg 
(1982b). 

Multiamplitude CPM Multiamplitude CPM (MACPM) is basically a 
combined amplitude and phase digital modulation scheme that allows us to 
increase the signaling alphabet relative to CPM in another dimension and, 
thus, to achieve higher data rates on a band-limited channel. Simultaneously, 
the combination of multiple amplitude in conjunction with CPM results in a 
bandwidth-efficient modulation technique. 

We have already observed the spectral characteristics of MACPM in Section 
4-3. The performance characteristics of MACPM have been investigated by 
Mulligan (1988) for both uncoded and trellis-coded CPM. Of particular interest 
is the result that trellis-coded CPM with two amplitude levels achieves a gain 
of 3-4 dB relative to MSK without a significant increase in the signal bandwidth. 

5-3-3 Symbol-by-Symbol Detection of CPM Signals 

Besides the ML sequence detector, there are other types of detectors that can 
be used to recover the information sequence in a CPM signal. In this section, 
we consider symbol -by -symbol detectors. One type of symbol-by-symbol 
detector is the one described in Section 5-1-5, which exploits the memory of 
CPM by performing matched filtering or cross-correlation over several 
signaling intervals. Because of its computational complexity, however, this 
recursive algorithm has not been directly applied to the detection of CPM. 
Instead, two similar, albeit suboptimal, symbol-by-symbol detection methods 
have been described in the papers by deBuda (1972), Osborne and Luntz 
(1974), and Schonhoff (1976). One of these is functionally equivalent to the 
algorithm given in Section 5-1-5, and the second is a suboptimum approxima- 
tion of the first. We shall describe these two methods in the context of 
demodulation of CPFSK signals, for which these detection algorithms have 
been applied directly. 

To describe these methods, we assume that the signal is observed over the 
present signaling interval and D signaling intervals into the future in deciding 
on the information symbol transmitted in the present signaling interval. A 
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FIGURE 5-3-10 
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MAXIMUM VALUES OF THE UPPER BOUND d 2 „ FOR MULTI /i LINEAR 
PHASE CPM“ 


M 

H 

Max d 2 B 

dB gain 
compared 
with MSK 

*. 

*2 

*3 

*4 

h 

2 

1 

2.43 

0.85 

0.715 




0.715 

2 

2 

4.0 

3.0 

0.5 

0.5 



0.5 

2 

3 

4.88 

3.87 

0.620 

0.686 

0.714 


0.673 

2 

4 

5.69 

4.54 

0.73 

0.55 

0.73 

0.55 

0.64 

4 

1 

4.23 

3.25 

0.914 




0.914 

4 

2 

6.54 

5.15 

0.772 

0.772 



0.772 

4 

3 

7.65 

5.83 

0.795 

0.795 

0.795 


0.795 

8 

1 

6.14 

4.87 

0.964 




0.964 

8 

2 

7.50 

5.74 

0.883 

0.883 



0.883 

8 

3 

8.40 

6.23 

0.879 

0.879 

0.879 


0.879 


‘From Aulin and Sundberg (1982b). 


Upper bounds on minimum squared 
euclidean distance for various M and 
H values. [From Aulir. and Sundberg 
(1982b). © 1982 IEEE.) 




298 DIGITAL COMMUNICATIONS 


FIGURE i-3-11 



Block diagram of demodulator for detection of CPFSK. 


block diagram of the demodulator, implemented as a bank of cross-correlators, 
is shown in Fig. 5-3-11. Recall that the transmitted CPFSK signal during the 
nth signaling interval is 

s(t) = Re MOe' 2 ’*'] 

where 



K h[t -{n- 1 )T)l n 
T 


n- 1 

+ nh 2 h + 


k =0 



h =2 f d T is the modulation index, f d is the peak frequency deviation, and <f> Q is 
the initial phase angle of the carrier. 

In detecting the symbol I u the cross-correlations shown in Fig. 5-3-11 are 
performed with the reference signals s(t, /,, / 2 , . . . , / 1+D ) for all M D +l possible 
values of the symbols I x , I 2 , . . . , I l+D transmitted over the D + 1 signaling 
intervals. But these correlations in effect generate the variables r u r 2 , . . . , r 1+D , 
which in turn are the arguments of the exponentials that occur in the pdf 

P( r \> r 2t ■ ■ ■ > r \ +d | h> ■ ■ ■ • h +d) 

Finally, the summations over the M D possible values of the symbols 
/ 2 , ly , . . . , l l+D represent the averaging of 

p( r i > r i< ■ • • > r \ +d I h > h, ■ ■ ■ > A +o)P(fi j • • • . fi+o) 
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over the M D possible values of these symbols. The M outputs of the 
demodulator constitute the decision variables from which the largest is selected 
to form the demodulated symbol. Consequently the metrics generated by the 
demodulator shown in Fig. 5-3-1 1 are equivalent to the decision variables given 
by (5-1-68) on which the decision on /, is based. 

Signals received in subsequent signaling intervals are demodulated in the 
same manner. That is, the demodulator cross-correlates the signal received 
over D + 1 signaling intervals with the Af u+1 possible transmitted signals and 
forms the decision variables as illustrated in Fig. 5-3-11. Thus the decision 
made on the /nth signaling interval is based on the cross-correlations 

performed over the signaling intervals m, m + 1 m 4 D. The initial phase 

in the correlation interval of duration (D + 1)7 is assumed to be known. On 
the other hand, the algorithm described by (5-1-76) and (5-1-77) involves an 
additional averaging operation over the previously detected symbols. In this 
respect, the demodulator shown in Fig. 5-3-11 differs from the recursive 
algorithm described above. However, the difference is insignificant. 

One suboptimum demodulation method that performs almost as well as the 
optimum method embodied in Fig. 5-3-11 bases its decision on the largest 
output from the bank of M D + ] cross-correlators. Thus the exponential 
functions and the summations are eliminated. But this method is equivalent 
to selecting the symbol /„, for which the probability density function 
P(r m , r m + r m+D \ l m , I m + D ) is a maximum. 

The performance of the detector shown in Fig. 5-3-11 has been upper- 
bounded and evaluated numerically. Figure 5-3-12 illustrates the performance 
of binary CPFSK with n = D + 1 as a parameter. The modulation index 
h =0.715 used in generating these results minimizes the probability of error as 



SNR per bit, Y^(dB) 


FIGURE 5-3-12 Performance of binary CPFSK with coherent detection. 
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FIGURE 5-3*13 



Performance of quaternary CPFSK with coherent detection. SNR per bit, 7 ^ (dB) 


shown by Schonhoff (1976). We note that an improvement of about 2.5 dB is 
obtained relative to orthogonal FSK (« = 1) by a demodulator that cross- 
correlates over two symbols. An additional gain of approximately 1.5 dB is 
obtained by extending the correlation time to three symbols. Further extension 
of the correlation time results in a relatively small additional gain. 

Similar results are obtained with larger alphabet sizes. For example. Figs 
5-3-13 and 5-3-14 illustrate the performance improvements for quaternary and 



FIGURE 5-3-14 Performance of octal CPFSK with coherent detection. 
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octal CPFSK, respectively. The modulation indices given in these graphs are 
the ones that minimize the probability of a symbol error. 

Instead of performing coherent detection, which requires knowledge of the 
carrier phase <f> 0 , we may assume that <f> 0 is uniformly distributed over the 
interval 0 to 2 n, and average over it in arriving at the decision variables. Thus 
coherent integration (cross-correlation) is performed over the n = D + 1 
signaling intervals, but the outputs of the correlators are envelope-detected. 
This is called noncoherent detection of CPFSK. In this detection scheme, 
performance Is optimized by selecting n to be odd and making the decision on 
the middle symbol in the sequence of n symbols. The numerical results on the 
probability of error for noncoherent detection of CPFSK are similar to the 
results illustrated above for coherent detection. That is, a gain of 2-3 dB in 
performance is achieved by increasing the correlation interval from n = 1 to 
n = 3 and to n = 5. 


5-4 OPTIMUM RECEIVER FOR SIGNALS WITH 
RANDOM PHASE IN AWGN CHANNEL 

In this section, we consider the design of the optimum receiver for carrier 
modulated signals when the carrier phase is unknown at the receiver and no 
attempt is made to estimate its value. Uncertainty in the carrier phase of the 
received signal may be due to one or more of the following reasons: First, the 
oscillators that are used at the transmitter and the receiver to generate 
the carrier signals are generally not phase synchronous. Second, the time delay 
in the propagation of the signal from the transmitter to the receiver is not 
generally known precisely. To elaborate on this point, a transmitted signal of 
the form 

J(0 = Re [g(ty 2 «'} 

that propagates through a channel with delay t n will be received as 

= R e[g{t - t»)e 

The carrier phase shift due to the propagation delay ?„ is 

<i> = ~2xft 0 

Note that large changes in the carrier phase </> can occur due to relatively small 
changes in the propagation delay. Fot example, if the carrier frequency 
f - 1 MHz, an uncertainty or a change in the propagation delay of 0.3 tx s will 
cause a phase uncertainly of rrrad. In some channels (e.g., radio channels) the 
time delay in the propagation of the signal from the transmitter to the receiver 
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may change rapidly and in an apparently random manner, so that the carrier 
phase of the received signal varies in an apparently random fashion. 

In the absence of knowledge of the carrier phase, we may treat this signal 
parameter as a random variable and determine the form of the optimum 
receiver for recovering the transmitted information from the received signal. 
First, we treat the case of binary signals and, then, we consider Af-ary signals. 


5-4-1 Optimum Receiver for Binary Signals 

We consider a binary communication system that uses the two carrier 
modulated signals s^t) and s 2 (r) to transmit the information, where 

Re [sim(t)e i2 * f ' r ), m = 1,2, O^t^T (5-4-1) 

and s /m (r), m = 1,2 are the equivalent lowpass signals. The two signals are 
assumed to have equal energy 



and are characterized by the complex-valued correlation coefficient 

1 f r 

Pi2 s P = ^| sMfcnit) dt (5-4-3) 

The received signal is assumed to be a phase-shifted version of the 
transmitted signal and corrupted by the additive noise 

«(0 = Re (MO + y'Moy 2 *^} 

= Re \z(ty 2nfcr \ (5-4-4) 

Hence, the received signal may be expressed as 

r(/) = Re + z(t))e /2xf < t } (5-4-5) 

where 

ri(0 = (ty* + z (/), (5-4-6) 

is the equivalent lowpass received signal. This received signal is now passed 
through a demodulator whose sampled output at t = T is passed to the 
detector. 


The Optimum Demodulator In Section 5-1-1, we demonstrated that if the 
received signal was correlated with a set of orthonormal functions {/„(/)} that 
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spanned the signal space, the outputs from the bank of correlators provide a 
set of sufficient statistics for the detector to make a decision that minimizes the 
probability of error. We also demonstrated that a bank of matched filters could 
be substituted for the bank of correlators. 

A similar orthonormal decomposition can also be employed for a received 
signal with an unknown carrier phase. However, it is mathematically con- 
venient to deal with the equivalent lowpass signal and to specify the signal 
correlators or matched filters in terms of the equivalent lowpass signal 
waveforms. 

To be specific, the impulse response h,(t) of a filter that is matched to the 
complex-valued equivalent lowpass signal 0 *£ / =£ T, is given as (see 

Problem 5-6) 

h,(f) = sr(T-t) (5-4-7) 

and the output of such a filter at t = T is simply 

f MOP dt = 2% (5-4-8) 

Jo 

where % is the signal energy. A similar result is obtained if the signal $,(/) is 
correlated with sf(r) and the correlator is sampled at t = T. Therefore, the 
optimum demodulator for the equivalent lowpass received signal s,(t) given in 
(5-4-6) may be realized by two matched filters in parallel, one matched to s„(r) 
and the other to s /2 (f), and shown in Fig. 5-4-1. The output of the matched 
filters or correlators at the sampling instant are the two complex numbers 

r„> = r m( + jr ms , m = 1, 2 . (5-4-9) 

Suppose that the transmitted signal is s,(f). Then, it is easily shown (see 
Problem 5-35) that 


- 2? cos 4> + n u + y( 2£ sin <f> + n l5 ) 
r 2 - 2% |p | cos (<t> + a () ) + n 2c + }[2£ |p| sin (<f> + a (1 ) + « 2 »] 


(5-4-10) 



FIGURE 5-4- 1 Opiimum receiver for binary signals. 
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where p is the complex-valued correlation coefficient of the two signals 
and s /2 0), which may be expressed as p = |p| exp (a 0 ). The random noise 
variables n lc , n u , n 2l , and n are jointly gaussian, with zero mean and equal 
variance. 


The Optimum Detector The optimum detector observes the random 
variables [r u r u r 2c r z ,] = r, where r x = r u + jr u and r 2 = r 2c + jr^, and bases its 
decision on the posterior probabilities P(s„, | r), m - 1,2. These probabilities 
may be expressed as 


P(s m | r) = 


Pfr I »„ )/»(»»,) 
P(') 


m ~ 1,2 


and, hence, the optimum decision rule may be expressed as 


(5-4-11) 


T(S[ |r|) % P& | r) 

Sv 

or, equivalently, 

pHsQ i i P(h) 
p(r | h) t P( s t ) 


(5-4-12) 


The ratio of pdfs on the left-hand side of (5-4-12) is the likelihood ratio, which 
we denote as 


A(r) 


_ p(r | Si) 
P(f|s2) 


(5-4-13) 


The right-hand side of (5-4-12) is the ratio of the two prior probabilities, which 
takes the value of unity when the two signals are equally probable. 

The probability density functions p(r j s,) and p(r|s 2 ) can be obtained by 
averaging the pdfs p(r | s„,, <t>) over the pdf of the random carrier phase, i.e., 

/*2 n 

p(*kn)= P(r|s„,, <f>)p(<t>)d<t> (5-4-14) 


We shall perform the integration indicated in (5-4-14) for the special case in 
which the two signals are orthogonal, i.e., p = 0. In this case, the outputs of the 
demodulator are 


r \ =''|r + Fu 

— 2 % cos <p + n u . + j( 2% sin + n l v ) 

r 2 ~ r ic + jr^ 

= n 2l +jn 2 < 


(5-4-15) 
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where (ti,,, n u . n 2c , n 2% ) are mutually uncorrelated and, hence, statistically 
independent, zero-mean gaussian random variables (see Problem 5-25). Hence, 
the joint pdf of r = jr u . r h r 2l r 2v ] may be expressed as a product of the 
marginal pdfs. Consequently, 


p(r u . r ls | s, . d>) = — — rexp 

ln<j~ 


(r u . - 2 # cos <f>) 2 + (r u - 2 £sin <f>) 2 ' 

2<r 


I 


P(r 2 c r lt ) = 


2 n<r 


5 exp 




(5-4-16) 


where a 2 = 2£7V„. 

The uniform pdf for the carrier phase <£ represents the most ignorance that 
can be exhibited by the detector. This is called the least favorable pdf for <f> 
With p(d>)= l/2tr, 0=sd>«2;r, substituted into the integral in (5-4-14), we 
obtain 


2k 


r 


P(r [c ,r u s,.0)dd> 


/ ri+ri + 4& \ 1 ( 

\ 2<r 2 I 2k J 


= 2 T xp 


exp 


2%(r u . cos</> + r l( sin tj>) 


(5-4-17) 


But 


ff 

2k I, 


2ff [ 2 £(r,,. cos + #v< sin <£)] 

exp 1 1 


(5-4-!8) 


where / n (.r) is the modified Bessel function of zeroth order, defined in 
(2-1-120). 

By performing a similar integration as in (5-4-17) under the assumption that 
the signal s 2 (t) was transmitted, we obtain the result 


Pta,-. r Zs 


* 2 ) = — exp 
2k 


(- 


r\ L + r 2x + 4£ 2 \ /2^VH,. + r 


2a 2 




'2s ’ 


(5-4-19) 


When, we substitute these results into the likelihood ratio given by (5-4-13). 
we obtain the result 


A(r) = 


/ n (2%Vri + r]Ja 2 ) ^ P(s 2 ) 

M2%VZ~rrZ/<r')? : P( Sl ) 


(5-4-20) 


T hus, the optimum detector computes the two envelopes Vrf,. + r 2 , and 
Vr 2l . + r|, an d the corres ponding values of the Bessel function 
l 0 (2%Vrl + r 2 J<r 2 ) and l 0 (2^r\ ( . + ri/<r 2 ) to form the likelihood ratio. We 
observe that this computation requires knowledge of the noise variance cr : . 
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FIGURE 5-4-2 



Graph of /, x 


The likelihood ratio is then compared with the threshold P(s 2 )// 3 (s 1 ) to 
determine which signal was transmitted. 

A significant simplification in the implementation of the optimum detector 
occurs when the two signals are equally probable. In such a case the threshold 
becomes unity, and, due to the monotonicity of the Bessel function shown in 
Fig. 54-2, the optimum detection rule simplifies to 

'SrL + rT, k Vr 2 2r + r|, (5-4-2 1 ) 

S' 


Thus, the opti mum detector bases its decision on the two envelopes Vr 2 ,, + r 2 s 
and Vrf t + r|„ and, hence, it is called an envelope detector. 

We observe that the computation of the envelopes of the received signal 
samples at the output of the demodulator renders the carrier phase irrelevant 
in the decision as to which signal was transmitted. Equivalently, the decision 
may be based on the computation of the squared envelopes r] c + r\ s and 
r 2 , + ' n which case the detector is called a square-law detector. 

Binary FSK signals are an example of binary orthogonal signals. Recall that 
in binary FSK we employ two different frequencies, say /, and f 2 ~f + A/, to 
transmit a binary information sequence. The choice of minimum frequency 
separation A/ =f 2 ~f\ is considered below. Thus, the signal waveforms may be 
expressed as 

s(t) = T h cos 2nft, 0 ^t^T h 

(54-22) 

i 2 (r) = V2 r 4/7 h cos 2nf,t, 0 ^t^T h 

and their equivalent lowpass counterparts are 

*/.(0 = V 2 C/ 7 ;, 0^t*zT h 

, (5-4-23) 

s /2 (t) = V2%JT h ei 2 *V'. 0^t^T h 


The received signal may be expressed as 


r (0 ~ 



cos (2xf m t + <t> m ) + rt(r) 


(5-4-24) 
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FIGURE 5-4-3 
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Demodulation and square-law detection of binary FSK signals. 


where <f>,„ is the phase of the carrier frequency f,„. The demodulation of the 
real signal r(i) may be accomplished, as shown in Fig. 5-4-3. by using four 
correlators with the basis functions 


/i»,(0 = ^Y COS ^ 2rt f ] + lKm = 1 

= yjy sin [(2 -t/[ + 2 nm A/)/], m = 0, 1 


(5- 4-25) 


The four outputs of the correlators are sampled at the end of each signal 
interval and passed to the detector. If the mth signal is transmitted, the four 
samples at the detector may be expressed as 


sin [2 jt(/c -/«) A/T] 

'k< = v — r — rrTT — cos & 

l 2n(k - m) AfT 


r ks = [ 


2 n(k -m) AfT 

cos [2n(k - m ) A/ T] - 1 

^ z i \ \ r t- sin <t>„ 

2 K{k -m) AfT 

cos 2 tt(/c - m) AfT - 1 

cos 

2n(k-m)AfT 

sin [2;r(/c — m) A/Tj . 

^ - 5/1 v A r sin (f>„ 

2 n(k - m)AfT 


+- , /r, m = 1, 2 

(5-4-26) 

+ rt Av , A:, m = 1, 2 


where n ki . and n ks denote the gaussian noise components in the sampled 
outputs. 
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We observe that when k = m, the sampled values to the detector are 

r mc ~\%cas4> m + n mc 

h- (5-4-27) 

r ns = v% b sm<f> m + n m3 

Furthermore, we observe that when k¥*m, the signal components in the 
samples r kc and will vanish, independently of the values of the phase shifts 
4>k, provided that the frequency separation between successive frequencies is 
Af-l/T. In such a case, the other two correlator outputs consist of noise 
only, i.e., 

r kc = n kc> = k*m (5-4-28) 

With a frequency separation of Af = l/T, the relations (5-4-27) and (5-4-28) 
are consistent with the previous result (5-4-15) for the demodulator outputs. 
Therefore, we conclude that for envelope or square-law detection of FSK 
signals, the minimum frequency separation required for orthogonality of the 
signals is Af = l/T. This separation is twice as large as that required when the 
detection is phase-coherent. 

5*4*2 Optimum Receiver for M- ary Orthogonal Signals 

The generalization of the optimum demodulator and detector to the case of 
Af-ary orthogonal signals is straightforward. If the equal energy and equally 
probable signal waveforms are represented as 

s m (t) = Re [s tn ,(ty 2 ^'], m = 1,2 M, O^t^T (5-4-29) 

where s lm (t) are the equivalent lowpass signals, the optimum correlation-type 
or matched-filter-type demodulator produces the M complex-valued random 
variables 

r m = r mc + jr m , = [ r,{t)sf m (t) dt, m = (5-4-30) 

Jo 

where r,(t) is the equivalent lowpass received signal. Then, the optimum 
deteclor, based on a random, uniformly distributed carrier phase, computes the 
M envelopes 

I'J = VrL + rL, m = 1, 2, . . . , Af (5-4-31) 

or, equivalently, the squared envelopes |r m | 2 , and selects the signal with the 
largest envelope (or squared envelope). 

In the special case of Af-ary orthogonal FSK signals, the optimum receiver 
has the structure illustrated in Fig. 5-4-4. There are 2M correlators: two for 
each possible transmitted frequency. The minimum frequency separation 
between adjacent frequencies to maintain orthogonality is A/ = l/T. 

5-4-3 Probability of Error for Envelope Detection of A/ -ary 
Orthogonal Signals 

Let us consider the transmission of Af-ary orthogonal equal energy signals over 
an AWGN channel, which are envelope-detected at the receiver. We also 
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FIGURE 5-4-4 


cos 2*/, I 



Demodulation of M - ary FSK signals for noncoherent detection. 


assume that the M signals are equally probable a priori and that the signal s,(f) 
is transmitted in the signal interval 0 «= t T. 

The M decision metrics at the detector are the M envelopes 


|rj = VrL -f r 2 ms , m =1,2, .... M 

where 

r lc = \ / W s cos <f>t + n u . 
r b = V^sin^, + n u 

and 


(5-4-32) 


(5-4-33) 


W ^ met f m.\ tl mt , 2, 3, , M 


(5-4-34) 


The additive noise components {n mc \ and {n m J are mutually statistically 
independent zero-mean gaussian variables with equal variance cr 2 = { N , t . Thus 
the pdfs of the random variables at the input to the detector are 


, , 1 / r 2 u + r 2 u + %\ r (V%(r 2 u . -F r 2 ls )\ 

PrS r u,ru) 2;ro .2 ex p( 2a .2 )'<>( ^2 ) (5-4-35) 

P,Jr mc , r ms ) m = 2, 3 M (5-4-36) 









310 DIGITAL COMMUNICATIONS 


Let us make a change in variables in the joint pdfs given by (5-4-35) and 
(5-4-36). We define the normalized variables 


Rm = 




T~ 

ms 


0m = tan 1 


_ ] ' ms 


(5-4-37) 


Clearly, r mc = <rR m cos © m and r ms = <rR m sin 0 m . The Jacobian of this transfor- 
mation is 


Ul = 


a cos 0 m 
-o-R„,sin Qm 


<T sin 0m 
crR m cos 0 m 


= cr 2 R„ 


(5-4-38) 


Consequently, 


p <*' • e '> = 4 (* ; + t)W VlR 


p(R m , 0 m ) = ^ exp (— 


m = 2, 3, . . . , M 


(5-4-39) 

(5-4-40) 


Finally, by averaging p{R m , 0 m ) over 0 m , the factor of 2 n is eliminated from 
(5-4-39) and (5-4-40). Thus, we find that R, has a Rice probability distribution 
and R m , m = 2, 3, . . . , M, are each Rayleigh-distributed. 

The probability of a correct decision is simply the probability that R, > R 2 , 
and R, > R 3 , . . . , and R[ > R m . Hence, 

R C =B(R 2 <R 1 ,R 3 <R 1 R U <R X ) 

= [ P(R 2 <R*, P 3 <Ri,.., R m <R 1 | R l =x)p R (x) dx (5-4-41) 
'0 

Because the random variables R m , m= 2, 3, ... , M, are statistically iid, the 
joint probability in (5-4-41) conditioned on R, factors into a product of M — 1 
identical terms. Thus, 

P C =F [P(R 2 <R l | R, =x)] M ~ l p Rt (x)dx (5-4-42) 

-'0 

where 

R(R 2 <Ri | Rj =ar)= [ p H2 (h)dr 2 
J o 

= 1 - e~ x> ' 2 (5-4-43) 

The ( M - l)th power of (5-4-43) may be expressed as 

(1 - = Y (5-4-44) 
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Substitution of this result into (5-4-42) and integration over x yields the 
probability of a correct decision as 

M 1 / M — 1 \ 1 T nif 1 

(5-4-45) 


( M — \ \ 1 

n% 

) , , ex P 

\ n / n + 1 

l (/»+i)/vJ 


where r £JN v is the SNR per symbol. Then, the probability of a symbol error, 
which is P M = 1 - P t , becomes 


n - 1 


/ A/ — 1 \ 1 

nk% 

\ n )» + i eXP 

l («+DiV„J 


(5-4-46) 


where £ h /N„ is the SNR per bit. 

For binary orthogonal signals (M = 2), (5-4-46) reduces to the simple form 

A = ie (5-4-47) 

For M > 2, we may compute the probability of a bit error by making use of 
the relationship 

„ 2 k ~ l 

Pn = P« (5-4-48) 


which was established in Section 5-2. Figure 5-4-5 shows the bit-error 
probability as a function of the SNR per bit y h for M = 2, 4. 8, 16, and 32. Just 
as in the case of coherent detection of M - ary orthogonal signals (see Section 
5-2-2), we observe that for any given bit-error probability, the SNR per bit 
decreases as M increases. It will be shown in Chapter 7 that, in the limit as 
M c (or k = log; M— * x), the probability of a bit error P h can be made 


KKiURE 5-4-5 Probability of a bit error tor noncoherent detection of 
orthogonal signals. 



SNR per bit. YpCdBi 
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arbitrarily small provided that the SNR per bit is greater than the Shannon 
limit of — 1.6 dB. The cost for increasing M is the bandwidth required to 
transmit the signals. For M - ary FSK, the frequency separation between 
adjacent frequencies is A/ = 1/7’ for signal orthogonality. The bandwidth 
required for the M signals is W = M A/ = M/T. Also, the bit rate is R - k/T, 
where k - log 2 M. Therefore, the bit-rate-to-bandwidth ratio is 


R _ log 2 M 
W ~ M 


(5-4-49) 


5-4*4 Probability of Error for Envelope Detection 
of Correlated Binary Signals 

In this section, we consider the performance of the envelope detector for 
binary, equal-energy correlated signals. When the two signals are correlated, 
the input to the detector are the complex-valued random variables given by 
(5-4-10). We assume that the detector bases its decision on the envelopes |r,| 
and |r 2 |, which are correlated (statistically dependent). The marginal pdfs of 
R, — |r ( | and R 2 = |r 2 | are Ricean distributed, and may be expressed as 


' R m ( R 2 m + fil\ 

■ 2 «v 0 exp r 4 av„ )‘ 

( &mRm\ 

°\ 25TA 0 / 

(R m > 0) 

V • (5-4-50) 

.0 


(R m < 0) 


m = 1,2, where /3, =2# and /3 2 = 28|p|, based on the assumption that signal 
.*,(/) was transmitted. 

Since R { and R 2 are statistically dependent as a consequence of the 
nonorthogonality of the signals, the probability of error may be obtained by 
evaluating the double integral 

P /> = P(R 2 >R,)= I f p(x„x 2 )dx, dx 2 (5-4-51) 

4 ) J x t 

where p(x : , x 2 ) is the joint pdf of the envelopes /?, and R 2 . This approach was 
first used by Helstrom (1955), who determined the joint pdf of /?, and R 2 and 
evaluated the double integral in (5-4-51). 

An alternative approach is based on the observation that the probability of 
error may also be expressed as 

P h = P(R 2 >/?,) = P(R\ > R]) = P(R 2 2 -Ri> 0) (5-4-52) 

But R\ - R\ is a special case of a general quadratic form in complex-valued 
gaussian random variables, treated later in Appendix B. For the special case 
under consideration, the derivation yields the error probability in the form 

Ph = QM, b) - &-<‘ 1+b2) %(ab) 


(5-4-53) 
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FIGURE 5-4-6 


Probability of error for noncoherent 
detection. 






. . 


■ : ; 1 




H 

g 

g 

g 

HI 

g 

g 

g 

111 

Hill 

■ 

5 



— 

g 

g 

g 

— 

g 

M 

s 

s 

SBjj£ 




m 

is 

SB 

=== 










is 

B 


1 




B 


B 

5 

S 

g 

s 

M 

m 

g 

S3 

gjjg 

SEE! 

m 



# 6 10 II 12 1.1 14 IS 16 1' IS 

SNR per hit. y^uIBi 


where 


a = 
b = 


fiL 

2/V|i 


(1 - Vl - | P |-) 




(5-4-54) 


Q](a, b ) is the Q function defined in (2-1-123) and / 0 (.r) is the modified Bessel 
function of order zero. 

The error probability P h is illustrated in Fig. 5-4-6 for several values of |p|. 
P h is minimized when p = 0; that is, when the signals are orthogonal. For this 
case, a = 0 , b = V^,/A/„, and (5-4-53) reduces to 


P h = 



- ,e 


fi,l 2.V., 


(5-4-55) 


From the definition of (?i(«. b) in (2-1-123), it follows that 



Substitution of these relations into (5-4-55) yields the desired result given 
previously in (5-4-47). On the other hand, when |p| = 1, the error probability in 
(5-4-53) becomes P h = 3 , as expected. 


5-5 REGENERATIVE REPEATERS AND LINK 
BUDGET ANALYSIS 

In the transmission of digital signals through an AWGN channel, we have 
observed that the performance of the communication system, measured in 
terms of the probability of error, depends solely on the received SNR, % h /N v . 
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FIGURE 5-5-1 


Mathematical model of channel with attenuation 
and additive noise. 
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where % h is the transmitted energy per bit and ±N 0 is the power spectral density 
of the additive noise. Hence, the additive noise ultimately limits the 
performance of the communication system. 

In addition to the additive noise, another factor that affects the performance 
of a communication system is channel attenuation. All physical channels, 
including wire lines and radio channels, are lossy. Hence, the signal is 
attenuated as it travels through the channel. The simple mathematical model 
for the attenuation shown in Fig. 5-5-1 may be used for the channel. 
Consequently, if the transmitted signal is s(t), the received signal, with 
(Kasl is 

r(t) = as(t) + n(t) (5-5-1) 

Then, if the energy in the transmitted signal is the energy in the received 
signal is a 2 £„. Consequently, the received signal has an SNR cr&„//V 0 Hence, 
the effect of signal attenuation is to reduce the energy in the received signal 
and thus to render the communication system more vulnerable to additive 
noise. 

In analog communication systems, amplifiers called repeaters are used to 
periodically boost the signal strength in transmission through the channel. 
However, each amplifier also boosts the noise in the system. In contrast, digital 
communication systems allow us to detect and regenerate a clean (noise-free) 
signal in a transmission channel. Such devices, called regenerative repeaters, are 
frequently used in wireline and fiber optic communication channels. 


-5-1 Regenerative Repeaters 

The front end of each regenerative repeater consists of a demodulator/detector 
that demodulates and detects the transmitted digital information sequence sent 
by the preceding repeater. Once detected, the sequence is passed to the 
transmitter side of the repeater, which maps the sequence into signal 
waveforms that are transmitted to the next repeater. This type of repeater is 
called a regenerative repeater. 

Since a noise-free signal is regenerated at each repeater, the additive noise 
does not accumulate. However, when errors occur in the detector of a 
repeater, the errors are propagated forward to the following repeaters in the 
channel. To evaluate the effect of errors on the performance of the overall 
system, suppose that the modulation is binary PAM, so that the probability of 
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a bit error for one hop (signal transmission from one repeater to the next 
repeater in the chain) is 



Since errors occur with low probability, we may ignore the probability that any 
one bit will be detected incorrectly more than once in transmission through a 
channel with K repeaters. Consequently, the number of errors will increase 
linearly with the number of regenerative repeaters in the channel, and 
therefore, the overall probability of error may be approximated as 




\/Vo 


(5-5-2) 


In contrast, the use of K analog repeaters in the channel reduces the received 
SNR by K, and hence, the bit error probability is 

p b ~o( 

Clearly, for the same probability of error performance, the use of regenerative 
repeaters results in a significant saving in transmitter power compared with 
analog repeaters. Hence, in digital communication systems, regenerative 
repeaters are preferable. However, in wireline telephone channels that are 
used to transmit both analog and digital signals, analog repeaters are generally 
employed. 




(5-5-3) 


Example 5-5-1 

A binary digital communication system transmits data over a wireline 
channel of length 1000 km. Repeaters are used every 10 km to offset the 
effect of channel attenuation. Let us determine the £ b /iV () that is required to 
achieve a probability of a bit error of 10 5 if (a) analog repeaters are 
employed, and (b) regenerative repeaters are employed. 

The number of repeaters used in the system is K = 100. If regenerative 
repeaters are used, the obtained from (5-5-2) is 

io 5 = ioog 

10 7 = Q 

which yields approximately 11.3dB. If analog repeaters are used, the 4/% 
obtained from (5-5-3) is 



which yields £,,/N 0 = 29.6 dB. Hence, the difference in the required SNR is 
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FIGURE 5-5-2 Isotropically radiating antenna. 



about 18.3 dB, or approximately 70 times the transmitter power of the 
digital communication system. 


5-5-2 Communication Link Budget Analysis 

In the design of radio communications systems that transmit over line-of-sight 
microwave channels and satellite channels, the system designer must specify 
the size of the transmit and receive antennas, the transmitted power, and the 
SNR required to achieve a given level of performance at some desired data 
rate. The system design procedure is relatively straightforward and is outlined 
below. 

Let us begin with a transmit antenna that radiates isotropically in free space 
at a power level of P T watts as shown in Fig. 5-5-2. The power density at a 
distance d from the antenna is P T l4nd 2 W/m 2 . If the transmitting antenna has 
some directivity in a particular direction, the power density in that direction is 
increased by a factor called the antenna gain and denoted by G T . In such. a 
case, the power density at distance d is P T G T /4nd 2 W/m 2 . The product P T G T is 
usually called the effective radiated power (ERP or EIRP), which is basically 
the radiated power relative to an isotropic antenna, for which G T = 1. 

A receiving antenna pointed in the direction of the radiated power gathers a 
portion of the power that is proportional to its cross-sectional area. Hence, the 
received power extracted by the antenna may be expressed as 


/>* = 


PtG t Ar 

4rtd 2 


(5-5-4) 


where A R is the effective area of the antenna. From electromagnetic field 
theory, we obtain the basic relationship between the gain G R of an antenna and 
its effective area as 


A r 



(5-5-5) 


where A = c/fis the wavelength of the transmitted signal, c is the speed of light 
(3 x lf^m/s), and /is the frequency of the transmitted signal. 

If we substitute (5-5-5) for A R ihto (5-5-4), we obtain an expression for the 
received power in the form 


P t G t G r 
( 4nd/ A) 2 


(5-5-6) 
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FIGURE 5-5-3 


The factor 



(5-5-7) 


is called the free-space path loss. If other losses, such as atmospheric losses, are 
encountered in the transmission of the signal, they may be accounted for by 
introducing an additional loss factor, say L a . Therefore, the received power 
may be written in general as 

Pr = P T G T G R L s L a (5-5-8) 

As indicated above, the important characteristics of an antenna are its gain 
and its effective area. These generally depend on the wavelength of the 
radiated power and the physical dimensions of the antenna. For example, a 
parabolic (dish) antenna of diameter D has an effective area 


A r = \nD 2 -n (5-5-9) 

where \nD 2 is the physical area and r\ is the illumination efficiency factor, 
which falls in the range 0.5 € p as 0.6. Hence, the antenna gain for a parabolic 
antenna of diameter D is 


G r = i? 



(5-5-10) 


As a second example, a horn antenna of physical area A has an efficiency 
factor of 0.8, an effective area of A R =0.8A, and an antenna gain of 



(5-5-11) 


Another parameter that is related to the gain (directivity) of an antenna is 
its beam width, which we denote as 0 S and which is Illustrated graphically in 
Fig. 5-5-3. Usually, the beamwidth is measured as the —3 dB width of the 


Antenna beamwidth and pattern. 




lf>) Antenna pattern 
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antenna pattern. For example, the -3 dB beamwidth of a parabolic antenna is 
approximately 

0 ff = 7O(A/D)° (5-5-12) 

so that G t is inversely proportional to Q 2 B . That is, a decrease of the beamwidth 
by a factor of two, which is obtained by doubling the diameter D, increases the 
antenna gain by a factor of four (6 dB). 

Based on the general relationship for the received signal power given by 
(5-5-8), the system designer can compute P R from a specification of the antenna 
gains and the distance between the transmitter and the receiver. Such 
computations are usually done on a power basis, so that 

(P*) dB — (P r)d8 + (Gr)dB + (G«) dB F a,) dB + {L a ) M (5-5-13) 

Example 5-5-2 

Suppose that we have a satellite in geosynchronous orbit (36 000 km above 
the earth’s surface) that radiates 100 W of power, i.e,, 20 dB above 1 W 
(20dBW). The transmit antenna has a gain of 17 dB, so that the ERP = 
37 dBW. Also, suppose that the earth station employs a 3 m parabolic 
antenna and that the downlink is operating at a frequency of 4 GHz. The 
efficiency factor is 17 = 0.5. By substituting these numbers into (5-5-10), we 
obtain the value of the antenna gain as 39 dB. The free-space path loss is 

L s = 195.6 dB 

No other losses are assumed. Therefore, the received signal power is 

(P*) dB = 20+ 17 + 39- 195.6 
= -119.6 dBW 

or, equivalently, 

P* = 1.1 X 10“ 12 W 

To complete the link budget computation, we must also consider the effect 
of the additive noise at the receiver front end. Thermal noise that arises at the 
receiver front end has a relatively flat power density spectrum up to about 
10 12 Hz, and is given as 

N 0 = k B T 0 W/Hz (5-5-14) 

where k B is Boltzmann’s constant (1.38 x 10“ 23 Ws/K) and T 0 is the noise 
temperature in Kelvin. Therefore, the total noise power in the signal 
bandwidth W is N 0 W. 

The performance of the digital communications system is specified by the 
%b/No required to keep the error rate performance below some given value. 
Since 

Mo N 0 RN 0 


(5-5-15) 
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it follows that 



(5-5-16) 


where (£j,//V 0 ) re<1 is the required SNR per bit. Hence, if we have P R /N 0 and the 
required SNR per bit, we can determine the maximum data rate that is 
possible. 


Example 5-5-3 

For the link considered in Example 5-5-2, the received signal power is 
P R = 1.1 x 10 _ 12 W (- 119.6 dBW) 

Now, suppose the receiver front end has a noise temperature of 300 K, 
which is typical for receiver in the 4 GHz range. Then 

N 0 = 4.1 x 10' 21 W/Hz 

or. equivalently, -203.9 dBW/Hz. Therefore, 

— = -119.6 + 203.9 = 84.3 dB Hz 
No 

If the required SNR per bit is 10 dB then, from (5-5-16), we have the 
available rate as 

/? dB = 84.3- 10 

= 74.3 dB (with respect to 1 bit/s) 

This corresponds to a rate of 26.9 megabits/s, which is equivalent to about 
420 PCM channels, each operating at 64 000 bits/s. 


It is a good idea to introduce some safety margin, which we shall call the 
link margin A/ dB , in the above computations for the capacity of the com- 
munication link. Typically, this may be selected as M dB = 6dB. Then, the link 
budget computation for the link capacity may be expressed in the simple form 


*.-(£) -(f) - 


M 


dB 


(Pr\ ibw + (G t ) dB + (G«)dB 

+ + (C,XjB ~ (t - ) - 

v A/ 0 / rc<l 


M. 


db 


(5-5-17) 


BIBLIOGRAPHICAL NOTES AND REFERENCES 

In the derivation of the optimum demodulator for a signal corrupted by 
AWGN, we applied mathematical techniques that were originally used in 
deriving optimum receiver structures for radar signals. For example, the 
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matched filter was first proposed by North (1943) for use in radar detection, 
and is sometimes called the North filter. An alternative method for deriving 
the optimum demodulator and detector is the Karhunen-Loeve expansion, 
which is described in the classical texts by Davenport and Root (1958), 
Helstrom (1968), and Van Trees (1968). Its use in radar detection theory is 
described in the paper by Kelly et al. (1960). These detection methods are 
based on the hypothesis testing methods developed by statisticians, e.g., 
Neyman and Pearson (1933) and Wald (1947). 

The geometric approach to signal design and detection, which was presented 
in the context of digital modulation and which has its roots in Shannon's 
original work, is conceptually appealing and is now widely used since its 
introduction in the text by Wozencraft and. Jacobs (1965). 

Design and analysis of signal constellations for the AWGN channel have 
received considerable attention in the technical literature. Of particular 
significance is the performance analysis of two-dimensional (QAM) signal 
constellations that has been treated in the papers of Cahn (1960), Hancock and 
Lucky (1960), Campopiano and Glazer (1962), Lucky and Hancock (1962), 
Salz et al. (1971), Simon and Smith (1973), Thomas et al. (1974), and Foschini 
et al. (1974). Signal design based on multidimensional signal constellations has 
been described and analyzed in the paper by Gersho and Lawrence (1984). 

The Viterbi algorithm was devised by Viterbi (1967) for the purpose of 
decoding convolutional codes. Its use as the optimal maximum-likelihood 
sequence detection algorithm for signals with memory was described by Forney 
(1972) and Omura (1971). Its use for carrier modulated signals was considered 
by IJngerboeck (1974) and MacKenchnie (1973). It was subsequently applied 
to the demodulation of CPM by Aulin and Sundberg (1981a, b) and others. 


PROBLEMS 


5-1 A matched filter has the frequency response 


H(f) = 


1 

i2*f 


a Determine the impulse response h (r) cdrresponding to H(f). 
b Determine the signal waveform to which the filter characteristic is matched. 
5-2 Consider the signal 

= ( (A /T)t cos 2xfj (0m*sT) 
t0 (otherwise) 


a Determine the impulse response of the matched filter for the signal, 
b Determine the output of the matched filter at / = 7". 

c Suppose the signal s(f) is passed through a correlator that correlates the input 
s(t) with s(t). Determine the value of the correlator output al t = 7". Compare 
your result with that in (b). 
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5-3 This problem deals with the characteristics of a DPSK signal, 
a Suppose we wish to transmit the data sequence 

110100010110 

by binary DPSK Let s(/) = A cos (2 nf c t + 6) represent the transmitted signal in 
any signaling interval of duration T. Give the phase of the transmitted signal for 
the data sequence. Begin with 8 = 0 for the phase of the first bit to be 
transmitted. 

b If the data sequence is uncorrelated, determine and sketch the power density 
spectrum of the signal transmitted by DPSK. 

5-4 A binary digital communication system employs the signals 

s„(0 = o, o 

s,(0 = a , o 

for transmitting the information. This is called on-off signaling. The demodulator 
cross-correlates the received signal r(t) with s(t) and samples the output of the 
'correlator at t = T. 

a Determine the optimum detector for an AWGN channel and the optimum 
threshold, assuming that the signals are equally probable, 
b Determine the probability of error as a function of the SNR. How does on-off 
signaling compare with antipodal signaling? 

5-5 The correlation metrics given by (5-1-44) are 

/V N 

C(r, s„) = 2 J r„s„„ - £ s 2 mn , m = 1, 2, .... Af 

n = l n - 1 

where 

r* m f '(t)fn(t) dt 
*0 

s„(t)f„(t)dt 

Jo 

Show that the correlation metrics are equivalent to the metrics 

C(r, s m ) = 2 f r(t)s m (t) dt - f s 2 m (t)dt 
h Jo 

5-6 Consider the equivalent lowpass (complex-valued) signal s,(t), 0 « t =s T, with 
energy 



Suppose that this signal is corrupted by AWGN, which is represented by its 
equivalent lowpass form z(t). Hence, the observed signal is 

r,(t)=s,(t) + z(t), 0*5 t^T 

The received signal is passed through a filter that has an (equivalent lowpass) 
impulse response h,(t). Determine h,(t) so that the filter maximizes the SNR at its 
output (at t = T). 

5-7 Let z(t) =x(t) +jy(t) be a complex-valued, zero-mean white gaussian noise 
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FIGURE P5 8 



process with autocorrelation function <^„(r) = N 0 8(r). Let f m (t), m = 1, 2, . . . , M, 
be a set of A# orthogonal equivalent lowpass waveforms defined on the interval 
flsisT. Define 

AL, = Re [ z(f)/3(r)rff]. m = 1, 2, .... A# 

a Determine the variance of N mr . 
b Show that E(N mr N k ,) = 0 for k *m. 

5-8 The two equivalent lowpass signals shown in Fig. P5-8 are used to transmit a 
binary sequence over an additive white gaussian noise channel. The received signal 
can be expressed as 

r, (r ) = $,(/) + z(r), O^rsT, t = 1,2 

where z(f) is a zero-mean gaussian noise process with autocorrelation function 
<M0 = 2 £[z*(t)z(t + r)] =N 0 S(t) 

a Determine the transmitted energy in s,(f) and s 2 (t) and the cross-conrelation 
coefficient p, 2 . 

b Suppose the receiver is implemented by means of coherent detection using two 
matched filters, one matched to s,(r) and the other to s 2 (r). Sketch the 
equivalent lowpass impulse responses of the matched filters, 
c Sketch the noise-free response of the two matched filters when the transmitted 
signal is s 2 (t). 

d Suppose the receiver is implemented by means of two cross-coirelators 
(multipliers followed by integrators) in parallel. Sketch the output of each 
integrator as a function of time for the interval 0 *£ / =£ T when the transmitted 
signal is s 2 (f). 

e Compare the sketches in (c) and (d). Are they the same? Explain briefly, 
f From your knowledge of the signal characteristics, give the probability of error 
for this binary communications system. 

5-9 Suppose that we have a complex-valued gaussian random variable z=x+jy, 
where (x, y ) are statistically independent variables with zero mean and variance 
£(**) = £(y 2 ) = <r 2 Let 


and define r as 


r -z + m, where m ~ m, + jm, 
r=a + jb 


Clearly, a = x + m r and b ==y +n,. Determine the following probability density 
functions: 

* P(a, b), 
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b p(u, 4>). where u = \'a 2 + b : and d> = tan 1 blcr, 
c p(u). 

Note: In (b) it is convenient to define 8 = tan 1 ( mjm ,) so that 
m, = Vm; + m 2 cos 8, m, = Vm* + m 2 sin 8. 


Furthermore, you must use the relation 


2n 


r 


n cos iti> - 0) 


dct> - /o( a ) 


i 


a 2 " 

2 2 "{n\f 


where l 0 {a) is the modified Bessel function of order zero. 

5-10 A ternary communication system transmits one of three signals, s(r), 0, or -s(i), 
every T seconds. The received signal is either r,(t) = j(/) + z(t), r l (t) = z(t), or 
r M = ~s(t) + z(r), where z(r) is white gaussian noise with £(z(r)]=0 and 
<#h.(0 = l£[z(t)z*(r)] = N [t 8(t - r). The optimum receiver computes the cor- 
relation metric 


£ = Re 



dt 


and compares U with a threshold A and a threshold -A. If U>A, the decision is 
made that s(/} was sent. If U < - A , the decision is made in favor of -s(i). If 
-A < U < A, the decision is made in favor of 0. 

a Determine the three conditional probabilities of error} P c given that r(/) was 
sent, P t given that -s(r) was sent, and P e given that 0 was sent, 
b Determine the average probability of error P e as a function of the threshold A, 
assuming that the three 'symbols are equally probable a priori, 
c Determine the value of A that minimizes P r . 

5-11 The two equivalent lowpass signals shown in Fig. P5-11 are used to transmit a 
binary information sequence. The transmitted signals, which are equally probable, 
are corrupted by additive zero-mean white gaussian noise having an equivalent 
lowpass representation z(t) with an autocorrelation function 

d> ; .(r) = 5£[z*(r)z(r + r)) 

= No 8 (t) 

a What is the transmitted signal energy? 

b What is the probability of a binary digit error if coherent detection is employed 
at the receiver? 

c What is the probability of a binary digit error if noncoherent detection is 
employed at the receiver? 

5-12 In Section 4-3-1 it was shown that the minimum frequency separation for 
orthogonality of binary FSK signals with coherent detection is \f - 1 /2T. 
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FIGURE P5-13 
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However, a lower error probability is possible with coherent detection of FSK if A/ 
is increased beyond 1/27. Show that the optimum value of A / is 0.715 / T and 
determine the probability of error for this value of A/. 

5-13 The equivalent lowpass waveforms for three signal sets are shown in Fig. P5-13. 
Each set may be used to transmit one of four equally probable messages over an 
additive white gaussfan noise channel. The equivalent lowpass noise z(r) has zero 
mean and autocorrelation function = N 0 S( r). 

a Classify the signal waveforms in sets I, II, and III. In other words, state the 
category or class to which each signal set belongs, 
b What is the average transmitted energy for each signal set? 
c For signal set I, specify the average probability of error if the signals are 
detected coherently. 

d For signal set II, give a union bound on the probability of a symbol error if the 
detection is performed (i) coherently and (ii) noncoherently. 
e Is it possible to use noncoherent detection on signal set III? Explain, 
f Which signal set or signal sets would you select if you wished to achieve a ratio 
of bit rate to bandwidth ( R/W ) of at least 2. Briefly explain your answer. 

5-14 Consider a quaternary (Af = 4) communication system that transmits, every T 
seconds, one of four equally probable signals: s,(/), s 2 (/), ~s 2 (t). The 

signals s,(r) and s 2 (f) are orthogonal with equal energy. The additive noise is white 
gaussian with zero mean and autocorrelation function if> t ,(z) - N 0 8 (t). The 
demodulator consists of two filters matched to s,(r) and s 2 (r), and their outputs at 
the sampling instant are U, and U 2 . The detector bases its decision on the 
following rule: 

t/i> |t/d=>M0. (/,<-|f/ 2 |^-s,(/) 

l/ a >|C/.l =>*(0. 

Since the signal set is biorthogonal, the error probability is given by (1 - P c ) where 
P< is given by (5-2-34). Express this error probability in terms of a single integral 
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FIGURE PS-15 



and, thus, show that the symbol error probability for a biorthogonal signal set with 
M = 4 is identical to that for four-phase PSK. Hint: A change in variables from U, 
and U 2 to W, = U y + U 2 and W 2 = U\ ~~ U 2 simplifies the problem. 

5-15 The input j(r) to a bandpass filter is 

*(/> = Re 

where s 0 (t) is a rectangular pulse as shown in Fig. P5-15(a). 

a Determine the output y(/) of the bandpass filter for all r&0 if the impulse 
response of the filter is 

g(0 ~ Re [2h(i)e i2 * ,,J ) 

where h(t) is an exponential as shown in Fig. 5-1 5(b). 
b Sketch the equivalent low pass output of the filter. 

c When would you sample the output of the filter if you wished to have the 
maximum output at the sampling instant? What is the value of the maximum 
output? 

d Suppose that in addition to the input signal s(r), there is additive white gaussian 
noise 

n(t) = Re [z(r)e' 2 <'] 

where 4> u (t) = N a 8(r). At the sampling instant determined in (c), the signal 
sample is corrupted by an additive gaussian noise term. Determine its mean and 
variance. 

e What is the signal-to-noise ratio y of the sampled output? 
f Determine the signal-to-noise ratio when h(t) is the matched filter to s(t) and 
compare this result with the value of y obtained in (e). 

5-16 Consider the octal signal point constellations in Fig. P5-16. 




FIGURE PS-16 


8-PSK 


8-QAM 
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FIGURE PS-19 


a The nearest-neighbor signal points in the 8-QAM signal constellation are 
separated in distance by A units. Determine the radii a and b of the inner and 
outer circles. 

b The adjacent signal points in the 8-PSK are separated by a distance of A units. 
Determine the radius r of the circle. 

c Determine the average transmitter powers for the two signal constellations and 
compare the two powers. What is the relative power advantage of one 
constellation over the other? (Assume that all signal points are equally 
probable.) 

5-17 Consider the 8-point QAM signal constellation shown in Fig. P5-16. 

a Is it possible to assign three data bits to each point of the signal constellation 
such that nearest (adjacent) points differ in only one bit position? 
b Determine the symbol rate if the desired bit rate is 90 Mbits/s. 

5-18 Suppose that binary PSK is used for transmitting information over an AWGN with 
a power spectral density of \N 0 = KT‘° W/Hz. The transmitted signal energy is 
%> = ^A 2 T, where T is the bit interval and A is the signal amplitude. Determine 
the signal amplitude required to achieve an error probability of 10 when the data 
rate is (a) 10 kbits/s, (b) 100 kbits/s, and (c) 1 Mbit/s. 

5-19 Consider a signal detector with an input 

r = ±A + n 

where +A and -A occur with equal probability and the noise variable n is 
characterized by the (Laplacian) pdf shown in Fig. P5-19. 
a Determine the probability of error as a function of the parameters A and cr 
b Determine the SNR required to achieve an error probability of 1(T 5 . How does 
the SNR compare with the result for a Gaussian pdf? 

5-20 Consider the two 8-point QAM signal constellations shown in Fig. P5-20. The 
minimum distance between adjacent points is 2 A. Determine the average 
transmitted power for each constellation, assuming that the signal points are 
equally probable. Which constellation is more power-efficient? 

5-21 For the QAM signal constellation shown in Fig. P5-21, determine the optimum 
decision boundaries for the detector, assuming that the SNR is sufficiently high so 
that errors only occur between adjacent points. 






FIGURE PS-2# 
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FIGURE P5-21 



5-22 Specify a Gray code for the 16-QAM signal constellation shown in Fig. P5-21. 

5-23 Two quadrature carriers coslnf t and sin 2 nft are used to transmit digital 
information through an AWGN channel at two different data rates, 10 kbits/s and 
lOOkbits/s. Determine the relative amplitudes of the signals for the two carriers so 
that the for the two channels is identical. 

5-24 Three messages m u m 2 , and are to be transmitted over an AWGN channel 
with noise power spectral density |/V„. The messages are 


i 



(o «/*sr) 

(otherwise) 


s 2 (') 


f 1 (0*f*MD 



-1 (|TsO«T) 
0 (otherwise) 


a What is the dimensionality of the signal space? 

b Find an appropriate basis for the signal space. [Hint: You can find the basis 
without using the Gram-Schmidt procedure.] 
c Draw the signal constellation for this problem, 
d Derive and sketch the optimal decision regions R u R 2 , and /?,. 
e Which of the three messages is more vulnerable to errors and why? In other 
words, which of /’(error | m, transmitted), i = 1, 2, 3, is larger? 

5-25 When the additive noise at the input to the demodulator is colored, the filter 
matched to the signal no longer maximizes the output SNR. In such a case we may 
consider the use of a prefilter that “whitens" the colored noise. The prefilter is 
followed by a filter matched to [he prefiltered signal. Towards this end, consider 
the configuration shown in Fig. P5-25. 

a Determine the frequency response characteristic of the prefilter that whitens the 
noise. 


FIGURE P5-25 
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b Determine the frequency response characteristic of the filter matched to s(t). 
c Consider the prefilter and the matched filter as a single "generalized matched 
filter ” What is the frequency response characteristic of this filter'* 
d Determine the SNR at the input to the detector. 

5-26 Consider a digital communication system that transmits information via QAM 
over a voice-band telephone channel at a rate 2400 symbols/s. The additive noise 
is assumed to be white and gaussian. 

a Determine the % h /N u required to achieve an error probability of 10 s at 
4800 bits/s. 

b Repeat (a) for a rate of 9600 bits/s. 

C Repeat (a) for a rate of 19 200 bits/s. 
d What conclusions do you reach from these results? 

5-27 Consider the four-phase and eight-phase signal constellations shown in Fig. P5-27. 
Determine the radii r, and r 2 of the circles such that the distance between two 
adjacent points in the two constellations is d. From this result, determine ihe 
additional transmitted energy required in the 8-PSK signal to achieve the same 
error probability as the four-phase signal at high SNR, where the probability of 
error is determined by errors in selecting adjacent points. 

5-28 Digital information is to be transmitted by carrier modulation through an additive 
gaussian noise channel with a bandwidth of 100kHz and A(,= 10 W/Hz. 
Determine the maximum rate that can be transmitted through the channel for 
four-phase PSK, binary FSK, and four-frequency orthogonal FSK, which is 
detected noncoherently. 

5-29 In a MSK signal, the initial state for the phase is either 0 or it rad. Determine the 
terminal phase state for the following four input pairs of input data: (a) 00- (b) 01 - 
tc) 10; (d) It. 

5-30 A continuous-phase FSK signal with h = \ is represented as 

s(r) = ± \/^r cos cos 2itfj ± yj y^(~) sin 2 nfj, 0^t^2T h 

where the ± signs depend on the information bits transmitted, 
a Show that this signal has constant amplitude. 

b Sketch a block diagram of the modulator for synthesizing the signal, 
c Sketch a block diagram of the demodulator and detector for recovering the 
information. 

5-31 Sketch the phase tree, the state trellis, and the state diagram for partial-response 
CPM with h - and 

/4 11/47 (0 r « 27) 

t<(r) = < 

10 (otherwise) 


FIGURE P5-27 



M = 4 
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5-32 Deiermine the number of terminal phase states in the state trellis diagram for (a) a 
full response binary CPFSK with either h = ) or $ and (b) a partial-response 2.-3 
binary CPFSK with either h = \ or i. 

5-33 Consider a biorthogonal signal set with M = 8 signal points. Determine a union 
bound for the probability of a symbol error as a function of £ b tN n . The signal 
points are equally likely a priori. 

5-34 Consider an Mary digital communication system where M = 2", and N is the 
dimension of the signal space. Suppose that the M signal vectors lie on the vertices 
of a hypercube that is centered at the origin. Determine the average probability of 
a symbol error as a function of $,/N« where is the energy per symbol, is the 
power spectral density of the AWGN, and all signal points are equally probable. 

5-35 Consider the signal waveform 


s (0 = ]?c,p(t-kT c ) 

I 

where p(t) is a rectangular pulse of unit amplitude and duration 7(. The {c,} may 
be viewed as a code vector C = [c, c, . . . c„), where the elements c, = ±1. Show 
that the filter matched to the waveform s(t) may be realized as a cascade of a filter 
matched to p{t) followed by a discrete-time filter matched to the vector C. 
Determine the value of the output of the matched filter at the sampling instant 
/ =nT,.. 

5-36 A speech signal is sampled at a rate of 8 kHz, logarithmically compressed and 
encoded into a PCM format using 8 bits/sample. Hie PCM data is transmitted 
through an AWGN baseband channel via M-level PAM. Determine the band- 
width required for transmission when (a) M =4, (b) M = 8, and (c) M ~ 16. 

5-37 A Hadamard matrix is defined as a matrix whose elements are ±1 and whose row 
vectors are pairwise orthogonal, in the case when n is a power of 2, an n X n 
Hadamard matrix is constructed by means of the recursion 


n n 

„ r h„ 

H„ 

li -iJ 


H, 


a Let C, denote the /th row of an n X n Hadamard matrix as defined above. Show 
that the waveforms constructed as 

n 

*(0 ~ 2 c, t p(f - kT<), i = \,2,...,n 

k ~ I 

are orthogonal, where p(t) is an arbitrary pulse confined to the time interval 
0*r*7;.. 

b Show that the matched filters (or cross-correlators) for the n waveforms {s,(f)J 
can be realized by a single filter (or correlator) matched to the pulse p(t) 
followed by a set of n cross-correlators using the code words {C,}. 

5-38 The discrete sequence 

fk ~ + n k , k = 1 , 2 , . . . , n 

represents the output sequence of samples from a demodulator, where c* . = ± 1 are 
elements of one of two possible code words, C, = [1 1 ... 1) and C- = 
[1 I ... 1 -1 ... -1], The code word C 2 has w elements that are +1 and n - w 
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elements that are -1, where w is some positive integer. The noise sequence {«*} is 
white gaussian with variance <r 2 . 

a What is the optimum maximum likelihood detector for the two possible 
transmitted signals? 

b Determine the probability of error as a function of the parameters (a 2 , % b , tv), 
c What is the value of w that minimizes the error probability? 

5-39 Derive the outputs r , and r 2 of the two correlators shown in Fig. 5-4-1. Assume 
that a signal J„(r) is transmitted and that 


r<(0 = 5/i(t)e / * + z(t) 


where z(f) = n r (t ) + jn s (t) is the additive gaussian noise. 

5-40 Determine the covariances and variances of the gaussian random noise variables 
«■<-, « 2 < . nij. and n* in (5-4-15) and the joint pdf. 

5-41 Derive the matched filter outputs given by (5-4-10). 

5-42 In on-off keying of a carrier-modulated signal, the two possible signals are 


s„(0 = 0, 0 =£ / s; T h 

fag 

5.(0 = A /— COS iTtfj, 0 S/<r„ 


The corresponding received signals are 
r(z)=n(t), 0^t*£T h 

12% 

r 0) - J—rcos ( 2 nfj +<t>)+ «(/), 

V *b 


where 4> is the carrier phase and n(t) is AWGN. 

a Sketch a block diagram of the receiver (demodulator and detector) that employs 
noncoherent (envelope) detection. 

b Determine the pdfs for the two possible decision variables at the detector 
corresponding to the two possible received signals, 
c Derive the probability of error for the detector. 

5-43 In two-phase DPSK, the received signal in one signaling interval is used as a phase 
reference for the received signal in the following signaling interval. The decision 
variable is 


D 


*Re(V w V*_,)ao 


v k =2ct%e > ' v '-* ) + N k 


represents the complex-valued output of the filter matched to the transmitted 
signal u(t). N k is a complex-valued gaussian variable having zero mean and 
statistically independent components. 

* Writing V k = X k + jY t , show that D is equivalent to 


d = [ L AX m + x m „)f + [i(n. + F ,„-,)] 2 - [MX. - x ,-,)] 2 - \k{Y m - y m -,)] 2 

b For mathematical convenience; suppose that d k = 9„ Show that the random 
variables U,, U 2 , 17,, and U A are statistically independent gaussian variables 
where £(, « i(X, + X.-,). U 2 « ‘ 2 (Y m + Y m . t ), t/,= &X m - X m . t ). and U 4 = 
2{Ym ~ Y ln _ | ), 
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c Define the random variables W x = U] + U\ and W, = U\ + li\. Then 

D = w, ~ w 2 k 0 

“11" 


Determine the probability density functions for W, and VV : . 
d Determine the probability of error P h , where 


P h = P (D <0) = P( W, - W, < 0) = 



vv, )piv\ l )(h [ 


5-44 Recall that MSK can be represented as a four-phase offset PSK modulation having 
the lowpass equivalent form 

v(0 = 2 [4«(f - 2kT„) + jj k u(t ~ 2 kT„ - T„)\ 

k 

where 

j sin (m/2T h ) (0s='l«27;) 

u(t) = f 

10 (otherwise) 

and (/,,} and {/*} are sequences of information symbols (±1). 
a Sketch the block diagram of an MSK demodulator for offset QPSK. 
b Evaluate the performance of the four-phase demodulator for AWGN if no 
account is taken of the memory in the modulation, 
c Compare the performance obtained in (b) with that for Viterbi decoding of the 
MSK signal. 

d The MSK signal is also equivalent to binary FSK. Determine the performance of 
noncoherent detection of the MSK signal. Compare your result with (b) 
and (c). 

5-45 Consider a transmission line channel that employs n — 1 regenerative repeaters 
plus !he terminal receiver in the transmission of binary information. Assume that 
the probability or error at the detector of each receiver is p and that errors among 
repeaters are statistically independent. 

a Show that the binary error probability at the terminal receiver is 

P„ = H1 -0 -2 pY] 

b If p - 10 ^ and n = 100, determine an approximate value of P r . 

5-46 A digital communication system consists of a transmission line with 100 digital 
(regenerative) repeaters. Binary antipodal signals are used for transmitting the 
information. If the overall end-to-end error probability is 10''’. determine the 
probability of error for each repeater and the required $,J A'„ to achieve this 
performance in AWGN. 

5-47 A radio transmitter has a power output of P T = 1 W at a frequency of 1 GHz. The 
transmitting and receiving antennas are parabolic dishes with diameter D = 3 m. 
a Determine the antenna gains, 
b Determine the EIRP for the transmitter. 

c The distance (free space) between the transmitting and receiving antennas is 
20 km. Determine the signal power at the output of the receiving antenna in 
dBm. 
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5-48 A radio communication system transmits at a power level of 0.1 W at 1 GHz. The 
transmitting and receiving antennas are parabolic, each having a diameter of 1 m. 
The receiver is located 30 km from the transmitter, 
a Determine the gains of the transmitting and receiving antennas, 
b Determine the EIRP of the transmitted signal, 
c Determine the signal power from the receiving antenna. 

5-49 A satellite in synchronous orbit is used to communicate with an earth station at a 
distance of 40 000 km. The satellite has an antenna with a gain of 15 dB and a 
transmitter power of 3 W. The earth station uses a 10 m parabolic antenna with an 
efficiency of 0.6. The frequency band is at f = 10 GHz. Determine the received 
power level at the output of the receiver antenna. 

5-50 A spacecraft located 100 000 km from the earth is sending data at a rate of 
R bits/s. The frequency band is centered at 2 GHz and the transmitted power is 
10W. The earth station uses a parabolic antenna, 50 m in diameter, and the 
spacecraft has an antenna with a gain of 10 dB. The noise temperature of the 
receiver front end is T„ = 300 K. 
a Determine the received power level. 

b If the desired ^,/A r 0 = 10dB, determine the maximum bit rate that the 
spacecraft can transmit. 

5-51 A satellite in geosynchronous orbit is used as a regenerative repeater in a digital 
communication system. Consider the satellite-to-earth link in which the satellite 
antenna has a gain of 6 dB and the earth station antenna has a gain of 50 dB. The 
downlink is operated at a center frequency of 4 GHz, and the signal bandwidth is 
1 MHz. If the required % h IN„ for reliable communication is 15 dB, determine the 
transmitted power for the satellite downlink. Assume that N t , = 4.1 X 10" JI W/Hz. 



6 


CARRIER AND SYMBOL 
SYNCHRONIZATION 


We have observed that in a digital communication system, the output of the 
demodulator must be sampled periodically, once per symbol interval, in order 
to recover the transmitted information. Since the propagation delay from the 
transmitter to the receiver is generally unknown at the receiver, symbol timing 
must be derived from the received signal in order to synchronously sample the 
output of the demodulator. 

The propagation delay in the transmitted signal also results in a carrier 
offset, which must be estimated at the receiver if the detector is phase- 
coherent. In this chapter, we consider methods for deriving carrier and symbol 
synchronization at the receiver. 

6-1 SIGNAL PARAMETER ESTIMATION 

Let us begin by developing a mathematical model for the signal at the input to 
the receiver. We assume that the channel delays the signals transmitted 
through it and corrupts them by the addition of gaussian noise. Hence, the 
received signal may be expressed as 

r (0 —s(t — t) + rt(t) 

where 

s(t) = Re [stiry 2 ^ ] (6-1-1 ) 

and where r is the propagation delay and s,(t) is the equivalent lowpass signal. 
The received signal may be expressed as 

r(i) = Re {[i/(r - x)e>+ + z (r)]e' 2 ^} (6-1-2) 

where the carrier phase <f>, due to the propagation delay r, is - -2 nf c r, 
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Now, from this formulation, it may appear that there is only one signal 
parameter to be estimated, namely, the propagation delay, since one can 
determine from knowledge of f c and t. However, this is not the case. First of 
all, the oscillator that generates the carrier signal for demodulation at the 
receiver is generally not synchronous in phase with that at the transmitter. 
Furthermore, the two oscillators may be drifting slowly with time, perhaps in 
different directions. Consequently, the received carrier phase is not only 
dependent on the time delay r. Furthermore, the precision to which one must 
synchronize in time for purpose of demodulating the received signal depends 
on the symbol interval T. Usually, the estimation error in estimating t must be 
a relatively small fraction of T. For example, ±1% of T is adequate for 
practical applications. However, this level of precision is generally inadequate 
for estimating the carrier phase, even if <j> depends only on r. This is due to the 
fact that [ is generally large, and, hence, a small estimation error in r causes a 
large phase error. 

In effect, we must estimate both parameters r and 4> in order to demodulate 
and coherently detect the received signal. Hence, we may express the received 
signal as 

r(t) = s(t; <f>, T) + n(t) (6-1-3) 


where <j> and r represent the signal parameters to be estimated. To simplify the 
notation, we let «|i denote the parameter vector {<f>, r|. so that s((; 4>, r) is 
simply denoted by s(t; »Jj). 

There are basically two criteria that are widely applied to signal parameter 
estimation: the maximum-likelihood (ML) criterion and the maximum a 
posteriori probability (MAP) criterion. In the MAP criterion, the signal 
parameter vector 4* is modeled as random, and characterized by an a priori 
probability density function p(*|»). In the maximum-likelihood criterion, the 
signal parameter vector ijr is treated as deterministic but unknown. 

By performing an orthonormal expansion of r(t) using N orthonormal 
functions {/„(?)}, we may represent r(t) by the vector of coefficients 
[r, r 2 . . . r N ]sr. The joint pdf of the random variables [r, r 2 . . . r*] in the 
expansion can be expressed as p(r ] 4»). Then, the ML estimate of t|i is the 
value that maximizes p(t j r|i). On the other hand, the MAP estimate is the 
value of t|i that maximizes the a posteriori probability density function 


P(» l» | r) = 


p(i | 4i)p(4i) 

Pi r) 


(6-1-4) 


We note that if there is no prior knowledge of the parameter vector tji, we 
may assume that p(tji) is uniform (constant) over the range of values of the 
parameters. In such a case, the value of «h that maximizes p(r | 4>) also 
maximizes p(t|» | r). Therefore, the MAP and ML estimates are identical. 

In our treatment of parameter estimation given below, we view the 
parameters <f> and r as unknown, but deterministic. Hence, we adopt the ML 
criterion for estimating them. 
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In the ML estimation of signal parameters, we require that the receiver 
extract the estimate by observing the received signal over a time interval 
T (l ^ T, which is called the observation interval. Estimates obtained from a 
single observation interval are sometimes called one-shot estimates. In 
practice, however, the estimation is performed on a continuous basis by using 
tracking loops (either analog or digital) that continuously update the estimates. 
Nevertheless, one-shot estimates yield insight for tracking loop implementa- 
tion. In addition, they prove useful in the analysis of the performance of ML 
estimation, and their performance can be related to that obtained with a 
tracking loop. 


6-1-1 The Likelihood Function 

Although it is possible to derive the parameter estimates based on the joint pdf 
of the random variables [r, r 2 . . . r, v ] obtained from the expansion of r(-l), it is 
convenient to deal directly with the signal waveforms when estimating their 
parameters. Hence, we shall develop a continuous-time equivalent of the 
maximization of p ( r | i|i)- 

Since the additive noise n(i) is white and zero-mean gaussian, the joint pdf 
p { r | rji) may be expressed as 


where 




W -s„M] z ] 

2tr 2 I 


(6-1-5) 


\, 


riOUOdt 



( 6 - 1 - 6 ) 


where T Q represents the integration interval in the expansion of r{t) and s(r; if#). 

We note that the argument in the exponent may be expressed in terms of 
the signal waveforms r(t) and s(r; iji), by substituting from (6-1-6) into (6-1-5). 
That is, 

T5. 2 [r. r ~ s n( l > d»)] 2 = 77 wo - S(r; r|i)] 2 dt (6-1-7) 

zcr « = i /V 0 J7;, 


where the proof is left as an exercise for the reader (see Problem 6-1). Now, 
the maximization of p(t j *|r) with respect to the signal parameters is 
equivalent to the maximization of the likelihood function. 


A(«10 = exp { - WO “ *('• «l»)] 2 dl \ (6-1-8) 

Below, we shall consider signal parameter estimation from the viewpoint of 
maximizing A(i{i). 
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FIGURE 6-1-1 Block diagram of binary PSK receiver. 


6-1-2 Carrier Recovery and Symbol Synchronization 
in Signal Demodulation 

Symbol synchronization is required in every digital communication system 
which transmits information synchronously. Carrier recovery is required if the 
signal is detected coherently. 

Figure 6-1-1 illustrates the block diagram of a binary PSK (or binary PAM) 
signal demodulator and detector. As shown, the carrier phase estimate <£ is 
used in generating the reference signal g(t) cos {2nf c t + <j>) for the correlator. 
The symbol synchronizer controls the sampler and the output of the signal 
pulse generator. If the signal pulse is rectangular then the signal generator can 
be eliminated. 

The block diagram of an Af-ary PSK demodulator is shown in Fig. 6-1-2. In 
this case, two correlators (or matched filters) are required to correlate the 
received signal with the two quadrature carrier signals g(f) cos (2 jrf c t + $ ) and 
g(r) sin (27rf c t + $), where is the carrier phase estimate. The detector is now 
a phase detector, which compares the received signal phases with the possible 
transmitted signal phases. 

The block diagram of a PAM signal demodulator is shown in Fig. 6-1-3. In 
this case, a single correlator is required, and the detector is an amplitude 
detector, which compares the received signal amplitude with the possible 
transmitted signal amplitudes. Note that we have included an automatic gain 
control (AGC) at the front-end of the demodulator to eliminate channel gain 
variations, which would affect the amplitude detector. The AGC has a 
relatively long time constant, so that it does not respond to the signal 
amplitude variations that occur on a symbol-by-symbol basis. Instead, the 
AGC maintains a fixed average (signal plus noise) power at its output. 

Finally, we illustrate the block diagram of a QAM demodulator in Fig. 
6-1-4. As in the case of PAM, an AGC is required to maintain a constant 
average power signal at the input to the demodulator. We observe that the 
demodulator is similar to a PSK demodulator, in that both generate in-phase 
and quadrature signal samples (X, T) for the detector. In the case of QAM, 









CHAPTER f>; CARRIER AND SYMBOL SYNCHRONIZATION 337 



FIGURE 6-1-2 Block diagram of M- ary PSK. receiver. 


the detector computes the euclidean distance between the received noise- 
corrupted signal point and the M possible transmitted points, and selects the 
signal closest to the received point. 

6-2 CARRIER PHASE ESTIMATION 

There are two basic approaches for dealing with carrier synchronization at the 
receiver. One is to multiplex, usually in frequency, a special signal, called a 
pilot signal, that allows the receiver to extract and, thus, to synchronize its 
local oscillator to the carrier frequency and phase of the received signal. When 


FIGURE 6-1-3 Block diagram of M- ary PAM receiver. 
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FIGURE 6-1-4 Block diagram of QAM receiver. 


an unmodulated carrier component is transmitted along with the information- 
bearing signal, the receiver employs a phase-locked loop (PLL) to acquire and 
track the carrier component. The PLL is designed to have a narrow bandwidth 
so that it is not significantly affected by the presence of frequency components 
from the information-bearing signal. 

The second approach, which appears to be more prevalent in practice, is to 
derive the carrier phase estimate directly from the modulated signal. This 
approach has the distinct advantage that the total transmitter power is 
allocated to the transmission of the information-bearing signal. In our 
treatment of carrier recovery, we confine our attention to the second approach; 
hence, we assume that the signal is transmitted via suppressed carrier. 

In order to emphasize the importance of extracting an accurate phase 
estimate, let us consider the effect of a carrier phase error on the demodulation 
of a double-sideband, suppressed carrier (DSB/SC) signal. To be specific, 
suppose we have an amplitude-modulated signal of the form 

s(t) = A(t) cos (2nf c t + <}>) (6-2-1) 

If we demodulate the signal by multiplying s(t) with the carrier reference 

c(t) = cos (2xf c r + <j>) (6-2-2) 

we obtain 


c(t)s(t) = 2 A (/ ) cos (<t> - 4>) + 2<4(/)cos (4 nf c t + 4> + 4>) 
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The double-frequency component may be removed by passing the product 
signal c(t)s{t) through a lowpass filter. This filtering yields the information- 
bearing signal 

y(0 — \A(i) cos (0 ~ $ ) (6-2-3) 

Note that the effect of the phase error 0 - 0 is to reduce the signal level in 
voltage by a factor cos ( 0 - 0) and in power by a factor cos 2 (0 - 0). Hence, a 
phase error of 10° results in a signal power loss of 0.13 dB, and a phase error of 
30° results in a signal power loss of 1.25 dB in an amplitude-modulated signal. 

The effect of carrier phase errors in QAM and multiphase PSK is much 
more severe. The QAM and A0-PSK signals may be represented as 

s(t) = A(/) cos (2 nfj + 0) - B{t) sin {2k f t + 0) (6-2-4) 

This signal is demodulated by the two quadrature carriers 

COS (2Kf c t + 0) 

, , , (6-2-5) 

c s (t) = -sin (2 nfj + 0) 

Multiplication of s(r) with c c {t) followed by lowpass filtering yields the in-phase 
component 

>7(0 = M(0 cos (0 - 0) ^B(t) sin (0 - 0) (6-2-6) 

Similarly, multiplication of s(i) by c,(t ) followed by lowpass filtering yields the 
quadrature component 

y 0 (t) = \B{t) cos (0 - 0 ) + ^4(f) sin (0-0) (6-2-7) 

The expressions (6-2-6) and (6-2-7) clearly indicate that the phase error in the 
demodulation of QAM and M-PSK signals has a much more severe effect than 
in the demodulation of a PAM signal. Not only is there a reduction in the 
power of the desired signal component by a factor cos 2 (0 -0), but there is 
also crosstalk interference from the in-phase and quadrature components. 
Since the average power levels of A(t) and B(t) are similar, a small phase error 
causes a large degradation in performance. Hence, the phase accuracy 
requirements for QAM and multiphase coherent PSK are much higher than 
DSB/SC PAM. 

6-2-1 Maximum-Likelihood Carrier Phase Estimation 

First, we derive the maximum-likelihood carrier phase estimate. For simplicity, 
we assume that the delay r is known and, in particular, we set r = 0. The 
function to be maximized is the likelihood function given in (6-1-8). With 0 
substituted for this function becomes 

A(0) = exp{-^| [r(O-s(f:0)] 2 <*j 

= exp {■ i l r2(l> * + k L rm ‘ ■ * >d, ~kL +> *}■ 

( 6 - 2 - 8 ) 
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Note that the first term of the exponential factor does not involve the signal 
.parameter <p. The third term, which contains the integral of s 2 (r; 4>), is a 
constant equal to the signal energy over the observation interval 7^ for any 
value of (f>. Only the second term, which involves the cross-correlation of the 
received signal r(f) with the signal s(t\ 4>), depends on the choice of <f>. 
Therefore, the likelihood function A(<f>) may be expressed as 

= C exp [ <£)dr] (6-2-9) 

where C is a constant independent of <f>. 

The ML estimate $ ML is the value of </> that maximizes A (<f>) in (6-2-9). 
Equivalently, the value $ ML also maximizes the logarithm of i.e., the 

log-likelihood function 

Al(<*0 = 77 f r(t)s(t; <f>) dt (6-2-10) 

Wo J T 0 

Note that in defining A L (4>) we have ignored the constant term In C. 


Example 6-2-1 

As an example of the optimization to determine the carrier phase, let us 
consider the transmission of the unmodulated carrier Acos2nf c t. The 
received signal is 

r(l) = A cos (2 7tf c t + <f>) + n(t) 

where <f> is the unknown phase. We seek the value <j>, say <£ ML , that 
maximizes 

2 A f 

A l(4>) = — r(t) cos (2 nf c t + <f>)dt 

Wo Jr„ 

A necessary condition for a maximum is that 

dAdt) 

d<f> 

This condition yields 


| r(f) sin (2*£f+ j> ML )dt = 0 (6-2-11) 

or, equivalently, 

<^ML= -tan-^j r(t) sin 2xf c t dt / j r(t) cos 2nf c tdt j (6-2-12) 

We observe that the optimality condition given by (6-2-11) implies the use 
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FIGURE 6-2-1 A PLL for obtaining the ML estimate of the phase of an 
unmodulated carrier. 
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FIGURE 6-2-2 A (one-shot) ML estimate of the phase of an 
unmodulated carrier. 



of a loop to extract the estimate as illustrated in Fig. 6-2-1. The loop filter is 
an integrator whose bandwidth is proportional to the reciprocal of the 
integration interval T 0 . On the other hand, (6-2-12) implies an 
implementation that uses quadrature carriers to cross-correlate with r{t). 
Then, <j> ML is the inverse tangent of the ratio of these two correlator 
outputs, as shown in Fig. 6-2-2. Note that this estimation scheme yields <f> ML 
explicitly. 

This example clearly demonstrates that the PLL provides the ML estimate 
of the phase of an unmodulated carrier. 


6-2-2 The Phase-Locked Loop 

The PLL basically consists of a multiplier, a loop filter, and a voltage 
controlled oscillator (VCO), as shown in Fig. 6-2-3. If we assume that the input 
to the PLL is the sinusoid cos (2 nf c t + <£) and the output of the VCO is 
sin (2 jtf c t+(t>), where 4> represents the estimate of 4>, the product of these 
two signals is 

e(t) = cos (2 jrf c t + <{>) sin (2 nf c t + 4 > ) 

= \ sin (4> - 4>) + ^ sin (4 jtf c t + <f> + <f>) (6-2-13) 


lnpui 

Loop 


signal vi/ 

filter 


Output^ j | 


H 


signal | 
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FIGURE 6-2-3 Basic elements of a phase-located loop (PLL). 
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The loop filter is a lowpass filter that responds only to the low-frequency 
component i sin ((£-<£) and removes the component at 2 f c . This filter is 
usually selected to have the relatively simple transfer function 

1 + X 2 s 

(6-2-h) 

where T| and x 2 are design parameters (Tj > x 2 ) that control the bandwidth of 
the loop. A higher-order filter that contains additional poles may be used if 
necessary to obtain a better loop response. 

The output of the loop filter provides the control voltage v(/) for the VCO. 
The VCO is basically a sinusoidal signal generator with an instantaneous phase 
given by 

2jxf c t+j>(t) = 2rtf c t + K f u(r )dj (6-2-15) 

where K is a gain constant in rad/V. Hence, 

<£(r) = tf[ v(x)dx (6-2-16) 

By neglecting the double -frequency term resulting from the multiplication of 
the input signal with the output of the VCO, we may reduce the PLL into the 
equivalent closed-loop system model shown in Fig. 6-2-4. The sine function of 
the phase difference - <j> makes this system nonlinear, and, as a conse- 
quence, the analysis of its performance in the presence of noise is somewhat 
involved but, nevertheless, it is mathematically tractable for some simple loop 
filters. 

In normal operation when the loop is tracking the phase of the incoming 
carrier, the phase error - 4> is small and, hence, 

sin (0-d»)~<£-0 (6-2-17) 

With this approximation, the PLL becomes linear and is characterized by the 
closed-loop transfer function 


H(s) = 


KG(s)/s 
1 + KG(s)/s 


(6-2-18) 
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FIGURE (>*2-4 Model of phase-locked loop. 


VCO 






CHAPTER 0: CARRIER AND SYMBOL. SYNCHRONIZATION 343 


where the factor of 3 has been absorbed into the gain parameter K. By 
substituting from (6-2-14) for G(s ) into (6-2-18), we obtain 


H(s) = 


1 + t 2 s 

1 + {r 2 + \/K)s + (r l /K)s 2 


(6-2-19) 


Hence, the closed-loop system for the linearized PLL is second-order when 
G(s) is given by (6-2-14). The parameter r 2 controls the position of the zero, 
while K and r, are used to control the position of the closed-loop system poles. 
It is customary to express the denominator of H(S) in the standard form 

D{s) = s 2 + 2Cco„s + (6-2-20) 

where £ is called the loop damping factor and to ,, is th e natural frequency of the 
loop. In terms of the loop parameters, to,, = VkJt\, and £ = (r 2 + l/K)/2oo„, 
the closed-loop transfer function becomes 


H(s) = 


(2£t»„ - a >~,JK)s -f oif, 
s 2 + 2 £w„s + or„ 


( 6 - 2 - 21 ) 


The (one-sided) noise-equivalent bandwidth (see Problem 2-24) of the loop is 


ri(l/rs + Kir ,) 
4(r 2 + 1 /K) 

1 + (T 2U> „? 

8 fto„ 


( 6 - 2 - 22 ) 


The magnitude response 20 log |//(w)| as a function of the normalized 
frequency to/ to,, is illustrated in Fig. 6-2-5, with the damping factor £ as a 
parameter and r,»l. Note that £ = 1 results in a critically damped loop 
response, £ < 1 produces an underdamped response, and £ > 1 yields an 
overdamped response. 

In practice, the selection of the bandwidth of the PLL involves a trade-olf 
between speed of response and noise in the phase estimate, which is the topic 
considered below. On the one hand, it is desirable to select the bandwidth of 
the loop to be sufficiently wide to track any time variations in the phase of the 
received carrier. On the other, a wideband PLL allows more noise to pass into 
the loop, which corrupts the phase estimate. Below, we assess the effects of 
noise in the quality of the phase estimate. 


6-2-3 Effect of Additive Noise on the Phase Estimate 

In order to evaluate the effects of noise on the estimate of the carrier phase, let 
us assume that the noise at the input to the PLL is narrowband. For this 
analysis, we assume that the PLL is tracking a sinusoidal signal of the form 

s(t) = A, cos [Inf.t + </>(r)] 


(6-2-23) 
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FIGURE 5-2-5 
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Frequency response of a second-order loop. [From Phaselock Techniques. 2nd edition, by F. M. 
Gardner, © 1979 by John Wiley and Sons, Inc. Reprinted with permission of the publisher.] 


that is corrupted by the additive narrowband noise 

7 i(f) = jc(f) cos 2 nft - y(t) sin 2nf t t (6-2-24) 

The in-phase and quadrature components of the noise are assumed to be 
statistically independent, stationary gaussian noise processes with (two-sided) 
power spectral density 3 N 0 W/Hz. By using simple trigonometric identities, the 
noise term in (6-2-24) can be expressed as 

«(') = *,.(/) cos [27ift + 4>(t)] - «,(/) sin [2 71 f t + <£(/)] (6-2-25) 

where 

«,■(/) - -r(f) cos <b(r) + y(t) sin 4>(t) 

(6-2-26) 

«,(/) = -Jf(/) sin d>(r) 4- y(r) cos d>(r) 

We note that 

n c (t) +jn,(t ) = [x(0 

so that the quadrature components n c {t) and n s (t) have exactly the same 
statistical characteristics as .c(r) and y(t). 

If s(t) + /i(t) is multiplied by the output of the VCO and the double- 
frequency terms are neglected, the input to the loop filter is the noise- 
corrupted signal 

e(t) = A c sin A 4> + n c (t) sin A<f> - «,(?) cos A 4> 

= A c sin A 4> + n,(r) 


(6-2-27) 
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FIGURE 6-2-6 





Equivalent PLL model with additive noise. VCO 


where, by definition, &<f> — 4> — is the phase error. Thus, we have the 
equivalent model for the PLL with additive noise as shown in Fig. 6-2-6. 

When the power P c = {A; of the incoming signal is much larger than the 
noise power, we may linearize the PLL and, thus, easily determine the effect of 
the additive noise on the quality of the estimate <£. Under these conditions, the 
model for the linearized PLL with additive noise is illustrated in Fig. 6-2-7. 
Note that the gain parameter A c may be normalized to unity, provided that the 
noise terms are scaled by \/A c , i.e., the noise terms become 

sin Acf> ~~7~^ c°s (6-2-28) 

A.. A c 


Since the noise n 2 (t) is additive at the input to the loop, the variance of the 
phase error A<f>, which is also the variance of the VCO output phase, is 


a 


2 

i 


A?. 


(6-2-29) 


where B cq is the (one-sided) equivalent noise bandwidth of the loop, given in 
(6-2-22). Note that <r^ is simply the ratio of total noise power within the 
bandwidth of the PLL divided by the signal power A 2 . Hence, 

<4 = 1/7/. (6-2-30) 

where y L is defined as the signal-to-noise ratio 


SNR = y L = 


A 2 

■V„fl cq 


(6-2-31) 







FIGURE 6-2*7 Linearized PLL model with additive noise. 


VCX) 
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FIGURE 6-2-8 


Comparison of VCO phase variance for exact and approximate 
(linear model) first-order PLL. [From Principles of Coherent 
Communication, by A. J. Viterbi; © 1066 by McGraw-Hill 
Book Company. Reprinted with permission of the publisher.} 



The expression for the variance a\ of the VCO phase error applies to the 
case where the SNR is sufficiently high that the linear model for the PLL 
applies. An exact analysis based on the nonlinear PLL is mathematically 
tractable when G(s) = 1, which results in a first-order loop. In this case, the 
probability density function for the phase error may be derived (see Viterbi, 
1966) and has the form 


jp(A(h) = 


exp cos A<ft) 
2tr I a (y,) 


(6-2-32) 


where y L is the SNR given by (6-2-31) with 5 cq being the appropriate noise 
bandwidth of the first-order loop, and l 0 (-) is the modified Bessel function of 
order zero. 

From the expression for p( A</>), we may obtain the exact value of the 
variance for the phase error on a first-order PLL. This is plotted in Fig. 6-2-8 as 
a function of l/y, . Also shown for comparison is the result obtained with the 
linearized PLL model. Note that the variance for the linear model is close to 
the exact variance for y L > 3. Hence, the linear model is adequate for practical 
purposes. 

Approximate analyses of the statistical characteristics of the phase error for 
the nonlinear PLL have also been performed. Of particular importance is the 
transient behavior of the PLL during initial acquisition. Another important 
problem is the behavior of PLL at low SNR. It is known, for example, that 
when the SNR at the input to the PLL drops below a certain value, there is a 
rapid deterioration in the performance of the PLL. The loop begins to lose 
lock and an impulsive-type of noise, characterized as clicks, is generated which 
degrades the performance of the loop. Results on these topics can be found in 
the texts by Viterbi (1966), Lindsey (1972), Lindsey and Simon (1973), and 
Gardner (1979), and in the survey papers by Gupta (1975) and Lindsey and 
Chie (1981). 

Up to this point, we have considered carrier phase estimation when the 
carrier signal is unmodulated. Below, we consider carrier phase recovery when 
the signal carries information. 
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6-2-4 Decision -Directed Loops 

A problem arises in maximizing either (6-2-9) or (6-2-10) when the signal 
s(t: (f>) carries the information sequence {/„}. In this case we can adopt one of 
two approaches: either we assume that {/„} is known or we treat {/„} as a 
random sequence and average over its statistics. 

In decision-directed parameter estimation, we assume that the information 
sequence {/„} over the observation interval has been estimated and, in the 
absence of demodulation errors, 7„ = /„, where 7„ denotes the detected value of 
the information In this case s(f; (/>) is completely known except for the 
carrier phase. Decision-directed phase estimation was first described by 
Proakis et at. (1964). 

To be specific, let us consider the decision-directed phase estimate for the 
class of linear modulation techniques for which the received equivalent lowpass 
signal may be expressed as 


r(t) = e- J *Zl n g(t-nT) + z(t) 

n 

- s,(t)e~ i,b ■+■ z(r) (6-2-33) 

where s,(t) is a known signal if the sequence {/„} is assumed known. The 
likelihood function and corresponding log-likelihood function for the equiv 
alent lowpass signal are 


\(<t>) - C exp 


A M) = Re 


“ f r(t)s?(t) 

A o h {> 


nOe^dt 


e •* 


(6-2-34) 


(6-2-35 > 


If we substitute for s,(f) in (6-2-35) and assume that the observation interval 
T 0 = KT, where K is a positive integer, we obtain 

A *.(<£) = Re 1^77 2 r(t)g*(t - nT)dt\ 

^ n =0 J nT * 

f 1 K-l 1 

2 Ky. 

I /V 0fJ=0 j 


where, by definition 


J r(n *■ 1 )T 

r(t)g*(t - nT) dt 

nT 


(6-2-36) 


(6-2-37) 


Note that y„ is the output of the matched filter in the nth signal interval. The 
ML estimate of 4> is easily found from (6-2-36) by differentiating the 
log-likelihood 


/ i K- 1 x / i a-! x 

a lW = Re I — 2 i*y n ) cos <t> - Im ( — 2 I*y n I sin <f> 

^ 0 / 1=0 ' \/ V 0 n =0 . / 
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FIGURE 6-2-9 



Carrier recovery with a decision-feedback PLL. 


with respect to (f> and setting the derivative equal to zero. Thus, we obtain 

<»ML = -tan" 1 [im ( £ I*y^j / Re ( 2 /?>«)] (6-2-38) 

We call $ ML in (6-2-38) the decision-directed (or decision-feedback) carrier 
phase estimate. It is easily shown (Problem 6-10) that the mean value of <f> ML is 
<f>, so that the estimate is unbiased. Furthermore, the pdf of <J ML can be 
obtained (Problem 6-11) by using the procedure described in Section 5-2-7. 

A decision-feedback PLL (DFPLL) that is appropriate for a double- 
sideband PAM signal of the form A(t) co$(2rrf c t + <f>) is shown in Fig. 6-2-9. 
The received signal is multiplied by the quadrature carriers c c (t ) and c s (t), as 
given by (6-2-5), which are derived from the VCO. The product signal 

r(f) cos (2 Kf c t + 4>) = 2 [A(f) + n c (0] cos A<£ 

- 2 n s( l ) sin A<£ + double-frequency terms (6-2-39) 

is used to recover the information carried by A(t). The detector makes a 
decision on the symbol that is received every T seconds. Thus, in the absence 
of decision errors, it reconstructs A(t) free of any noise. This reconstructed 
signal is used to multiply the product of the second quadrature multiplier, 
which has been delayed by T seconds to allow the demodulator to reach a 
decision. Thus, the input to the loop filter in the absence of decision errors is 
the error signal 

e(t) ~ i4(/){[A(r) + n c (f)] sin A0 - n 5 (r) cos A<£>} 

+ double-frequency terms 
= 2-4 2 (f) sin A^> + £A(f)[/i r (f) sin A<6 - n,(0 cos A</>] 

+ double-frequency terms (6-2-40) 

The loop filter is lowpass and, hence, it rejects the double-frequency term in 
e(t). The desired component is A 2 (t) sin A <f>, which contains the phase error for 
driving the loop. 
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FIGURE 6-2-10 



Carrier recovery for M-ary PSK using a decision-feedback PLL. 


In the case of A/-ary PSK, the DFPLL has the configuration shown in Fig. 
6-2-10. The received signal is demodulated to yield the phase estimate 

„ 2 n 

Q,n = T7 ( m ~ 1) 

M 

which, in the absence of a decision error, is the transmitted signal phase. The 
two outputs of the quadrature multipliers are delayed by the symbol duration 
T and multiplied by cos 9,„ and sin 6,„ to yield 

r(t) cos {Ircf .t +■ 4>) sin 9 m 

= \[A cos 9 m + n c (f)] sin 9 m cos ($ - <j > ) 

- \\A sin 9 m + n^r)] sin 9 m sin [<f> ~ 4>) 

+ double-frequency terms 

r(t ) sin { 27 if c t 4 - <f > ) cos 9 m (6 241) 

= - 1{>4 cos 9 m + n c (t )] cos 9 m sin {4> - i ) 

- $[A sin 9 m + n^f)] cos 6 m cos (<f> - $ ) 

+ double-frequence terms 
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The two signals are added to generate the error signal 

<?(/)= ~^Asin(<f> -<j>) + K(0 sin (<£ 

+ 2«i(0 cos {(f> - <f> — &„,) + double-frequency terms (6-2-42) 

This error signal is the input to the loop filter that provides the control signal 
for the VCO. 

We observe that the two quadrature noise components in (6-2-42) appear as 
additive terms. There is no term involving a product of two noise components 
as in an A/th-power law device, described in the next section. Consequently, 
there is no additional power loss associated with the decision-feedback PLL. 

This A/-phase tracking loop has a phase ambiguity of 3607 A/, necessitating 
the need to differentially encode the information sequence prior to transmis- 
sion and differentially decode the received sequence after demodulation to 
recover the information. 

The ML estimate in (6-2-38) is also appropriate for QAM. The ML estimate 
for offset QPSK is also easily obtained (Problem 6-12) by maximizing the 
log-likelihood function in (6-2-35), with s,(t ) given as 


*/(0 = X UU -*T) +j X Jng(t ~ nr - Jr ) (6-2-43) 

n n 

where /„ = ±1 and J„ = ±1. 

Finally, we should also mention that carrier phase recovery for CPM signals 
can be accomplished in 'a decision-directed manner by use of a PLL. From the 
optimum demodulator for CPM signals, which is described in Section 5-3, we 
can generate an error signal that is filtered in a loop filter whose output drives 
a PLL. 


6-2-5 Non-Decision-Directed Loops 

Instead of using a decision-directed scheme to obtain the phase estimate, we 
may treat the data as random variables and simply average A (<f>) over these 
random variables prior to maximization. In order to carry out this integration, 
we may use either the actual probability distribution function of the data, if it 
is known or, perhaps, we may assume some probability distribution that might 
be a reasonable approximation to the true distribution. The following example 
illustrates the first approach. 


Example 6-2-2 

Suppose the real signal j(r) carries binary modulation. Then, in a signal 
interval, we have 


s(f) = A cos 2 ]rf c t, 0 « t « T 
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where A = ± 1 with equal probability. Clearly, the pdf of A is given as 

p(A)=iS(A-l) + ^S(A+ 1) 


Now, the likelihood function A(d>) given by (6-2-9) is conditional on a given 
value of A and must be averaged over the two values. Thus, 


r 


A (<M = A (4>)p{A)dA 


2 exp 


IN, 


2 r r 

r(t) cos (2nf c t + </>) dt 
At 


+ 


r 2 f T 

exp - tt I r(t) cos (2 nf L t + 4>) dt 
L N) i 


2 (' 

= cosh — r(f)cos {2nf c t + (f>)dt 

- J» 


and the corresponding log-likelihood function is 

2 rT 


A l (4>) = In cosh 


No Jo 


r(t) cos (27Tf, t -t - <f>)dt 


(6-2-44) 


If we differentiate A ,(<£) and set the derivative equal to zero, we obtain the 
ML estimate for the non-decision-directed estimate. Unfortunately, the 
functional relationship in (6-2-44) is highly nonlinear and, hence, an exact 
solution is difficult to obtain. On the other hand, approximations are 
possible. In particular. 


In cosh.v = 



(kl«i) 
( 1*1 *> 1 ) 


(6-2-45) 


With these approximations, the solution for 4> becomes tractable. 

In this example, we averaged over the two possible values of the 
information symbol. When the information symbols are M-valued, where M is 
large, the averaging operation yields highly nonlinear functions of the 
parameter to be estimated. In such a case, we may simplify the problem by 
assuming that the information symbols are continuous random variables. For 
example, we may assume that the symbols are zero-mean gaussian. The 
following example illustrates this approximation and the resulting form for the 
average likelihood function. 
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Example 6-2*3 

Let us consider the same signal as in Example 6-2-2, but now we assume 
that the amplitude A is zero-mean gaussian with unit variance. Thus, 


P(A) V2^ e 


A 2 n 


If we average A (<f>) over the assumed pdf of A, we obtain the average 
likelihood A (<f>) in the form 

M<f>) ~ C exp j r(t) cos (Infct + j (6-2-46) 

and the corresponding log-likelihood as 

M<f>) = f r(r)cos (2 nf c t + <f>) rf/j (6-2-47) 

We can obtain the ML estimate of <f> by differentiating A ,($) and setting 
the derivative to zero. 


It is interesting to note that the log-likelihood function is quadratic under 
the gaussian assumption and that it is approximately quadratic, as indicated in 
(6-2-45) for small values of the cross-correlation of r(/) with s(t; <f>). In other 
words, if the cross-correlation over a single interval is small, the gaussian 
assumption for the distribution of the information symbols yields a good 
approximation to the log-likelihood function. 

In view of these results, we may use the gaussian approximation on all the 
symbols in the observation interval T a = KT. Specifically, we assume that the K 
information symbols are statistically independent and identically distributed. 
By averaging the likelihood function A (<f>) over the gaussian pdf for each of 
the K symbols in the interval T 0 = KT, we obtain the result 

r*z' f 2 f<" + 1)r 1 2, i 

A(<£) = C exp | 2 j r(t) cos (2 nf c t 4 <f>) dt j j (6-2-48) 

If we take the logarithm of (6-2-48), differentiate the resulting log-likelihood 
function, and set the derivative equal to zero, we obtain the condition for the 
ML estimate as 

^ rtn+\)T 

L J r(r) cos (2jtf c t + j>) dt\ r(t) sin (2nf c t + <j>)dt~Q 

rt-0 -Vi 7* 


(6-2-49) 
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FIGURE 6-2-11 Non-decision-dirtcted PLL for carrier phase estimations of PAM signals. 


Although this equation can be manipulated further, its present form suggests 
the tracking ioop configuration illustrated in Fig. 6-2-11. This loop resembles a 
Costas loop, which is described below. We note that the multiplication of the 
two signals from the integrators destroys the sign carried by the information 
symbols. The summer plays the role of the loop filter. In a tracking loop 
configuration, the summer may be implemented either as a sliding-window 
digital filter (summer) or as a lowpass digital filter with exponential weighting 
of the past data. 

In a similar manner, one can derive non-decision directed ML phase 
estimates for QAM and M-PSK. The starting point is to average the likelihood 
function given by (6-2-9) over the statistical characteristics of the data. Here 
again, we may use the gaussian approximation (two-dimensional gaussian for 
complex-valued information symbols) in averaging over the information 
sequence. 


Squaring Loop The squaring loop is a non-decision-directed loop that is 
widely used in practice to establish the carrier phase of double-sideband 
suppressed carrier signals such as PAM. To describe its operation, consider the 
problem of estimating the carrier phase of the digitally modulated PAM signal 
of the form 

s(t) = A{t) cos (2 7cfj + 4>) (6-2-50) 

where A(r) carries the digital information. Note that £[s(r)] = £[>1(03 = 0 
when the signal levels are symmetric about zero. Consequently, the average 
value of s(t) does not produce any phase coherent frequency components at 
any frequency, including the carrier. One method for generating a carrier from 
the received signal is to square the signal and, thus, to generate a frequency 
component at 2 f , which can be used to drive a phase-locked loop (PLL) tuned 
to 2 f . This method is illustrated in the block diagram shown in Fig. 6-2-12. 
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FIGURE 6-2-12 



demodulator Frequency 
divider 


Carrier recovery using a square-law device. 


The output of the square-law device is 

s\i) - A 2 (t) cos 2 (2 xf c t + 0) 

= \A\t) + \A\t) cos (4nf c t + 2<f>) (6-2-51 ) 

Since the modulation is a cyclostationary stochastic process, the expected value 
of s 2 (t) is 

£[s 2 (f)] = *£[/l 2 (r)| + \E[A\t) ] cos (4 icf c t + 20) (6-2-52) 

Hence, there is power at the frequency 2/ f . 

If the output of the square-law device is passed through a bandpass filter 
tuned to the double-frequency term in (6-2-51), the mean value of the filter is a 
sinusoid with frequency 2 f c , phase 20, and amplitude {E[A\t)\H(2f c ), where 
H(2f c ) is the gain of the filter at f — 2f c . Thus, the square-law device has 
produced a periodic component from the input signal s(t). In effect, the 
squaring of s(f) has removed the sign information contained in A(t) and, thus, 
has resulted in phase-coherent frequency components at twice the carrier. The 
filtered frequency component at 2f. is then used to drive the PLL. 

The squaring operation leads to a noise enhancement that increases the 
noise power level at the input to the PLL and results in an increase in the 
variance of the phase error. 

To elaborate on this point, let the input to the squarer be s(t) + n(t), where 
r(/) is given by (6-2-50) and n(l) represents the bandpass additive gaussian 
noise process. By squaring s(t) + n(t), we obtain 

y{t) = s 2 (t) + 2 s(t)n(t) + n\t) (6-2-53) 

where s 2 (r) is the desired signal component and the other two components are 
the signal X noise and noise x noise terms. By computing the autocorrelation 
functions and power density spectra of these two noise components, one can 
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easilv show that both components have spectral power in the frequency band 
centered at 2 f. Consequently, the bandpass filter with bandwidth B hp centered 
at 2 f , which produces the desired sinusoidal signal component that drives the 
PLL, also passes noise due to these two terms. 

Since the bandwidth of the loop is designed to be significantly smaller than 
the bandwidth B hr of the bandpass filter, the total noise spectrum at the input 
to the PLL may be approximated as a constant within the loop bandwidth. This 
approximation allows us to obtain a simple expression for the variance of the 
phase error as 

"T = \/y L S L 

where S, is called the squaring loss and is given by 

5 , = f 1 + 

' 7i. ' 

Since S t <1, S, ' represents the increase in the variance of the phase error 
caused by the added noise (noise x noise terms) that results from the squarer. 
Note, for example, that when y, = Z? hp /25 eq , the loss is 3 dB. 

Finally, we observe that the output of the VCO from the squaring loop must 
be frequency-divided bv 2 to generate the phase-locked carrier for signal 
demodulation. It should be noted that the output of the frequency divider has 
a phase ambiguity of 180° relative to the phase of the received signal. For this 
reason, the binary data must be differentially encoded prior to transmission 
and differentially decoded at the receiver. 

Costas Loop Another method for generating a properly phased carrier for 
a double-sideband suppressed carrier signal is illustrated by the block diagram 
shown in Fig. 6-2-13. This scheme was developed by Costas (1956) and is 


(6-2-54) 


(6-2-55) 



FIGURE 6-2-13 Block diagram of Costas loon. 


O 
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called thfe Costas loop. The received signal is multiplied by cos (2rcf c t + 4 > ) and 
sin (2 itf c t + 4>), which are outputs from the VCO. The two products are 

y c (() = [s(/) + «(/)] cos ( 2nf c t + <£) 

— 2[y4(f) + n c (0] cos A <f> + 2 «i (0 sin A<£ 

+ double-frequency terms 

(6-2-56) 

y,(/) = [$(/) + «(/)] sin (2 xf c t + <f>) 

= 2^(0 + °c( r )] sin A <f> - 2^(0 cos A <f> 

+ double-frequency terms 

where the phase error A0 =<£-<£. The double-frequency terms are eliminated 
by the lowpass filters following the multiplications. 

An error signal is generated by multiplying the two outputs of the lowpass 
filters. Thus, 

«(0 = + «c(0] 2 - « 2 (0) sin (2 A (f>) 

- kn s (t)[A(t ) + n c (t )] cos (2 A<£) (6-2-57) 

This error signal is filtered by the loop filter, whose output is the control 
voltage that drives the VCO. The reader should note the similarity of the 
Gostas loop to the PLL shown in Fig. 6-2-11. 

We note that the error signal into the loop filter consists of the desired term 
A 2 (t) sin 2(<£ - <f>) plus terms that involve signal x noise and noise x noise. 
These terms are similar to the two noise terms at the input to the PLL for the 
squaring method. In fact, if the loop filter in the Costas loop is identical to that 
used in the squaring loop, the two loops are equivalent. Under this condition, 
the probability density function of the phase error and the performance of the 
two loops are identical. 

It is interesting to note that the optimum lowpass filter for rejecting the 
double-frequency terms in the Costas loop is a filter matched to the signal 
pulse in the information-bearing signal. If matched filters are employed for the 
low pass filters, their outputs could be sampled at the bit rate, at the end of 
each signal interval, and the discrete-time signal samples could be used to drive 
the loop. The use of the matched filter results in a smaller noise into the loop. 

Finally, we note that, as in the squaring PLL, the output of the VCO 
contains a phase ambiguity of 180°, necessitating the need for differential 
encoding of the data prior to transmission and differential decoding at the 
demodulator. 


Carrier Estimation for Multiple Phase Signals When the digital informa- 
tion is transmitted via M - phase modulation of a carrier, the methods described 
above can be generalized to provide the properly phased carrier for 
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FIGURE 



Output 

Carrier recovery with V/th power law device for M-ary PSK. 


demodulation. The received A/-phase signal, excluding the additive noise, mav 
be expressed as 


v ( r ) = A cos 


2 n 


) , 


2 nf t + <f> + ~{m - 1) , m ~ 1, 2, ... , M ( 6 - 2 - 58 ) 
M ' 


where 2 n(m - 1 )/M represents the information-bearing component of the 
signal phase. The problem in carrier recovery is to remove the information - 
bearing component and, thus, to obtain the unmodulated carrier cos (2 Kf t + 
<t>). One method by which this can be accomplished is illustrated in Fig. 6-2-14. 
which represents a generalization of the squaring loop. The signal is passed 
through an Afth-power-law device, which generates a number of harmonics of 
X- The bandpass filter selects the harmonic cos (2 nMf c t + M<f>) for driving the 
PLL. The term 


~{m - \)M = 2n{m - \) = 0 (mod 2^), m = l,2 M 

Thus, the information is removed. The VCO output is sin [InMft + so 
this output is divided in frequency by M to yield sin {2nfj + <£), and 
phase-shifted by rad to yield cos (2 xf c t + 4>). These components are then fed 
to the demodulator. Although not explicitly shown, there is a phase ambiguity 
in these reference sinusoids of 3607 AT which can be overcome by differential 
encoding of the data at the transmitter and differential decoding after 
demodulation at the receiver. 

Just as in the case of the squaring PLL, the Afth-power PLL operates in the 
presence of noise that has been enhanced by the Afth-power-law device, which 
results in the output 


y(r) = [s(r) + «(;)]" 
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The variance of the phase error in the PLL resulting from the additive noise 
may be expressed in the simple form 



( 6 - 2 - 59 ) 


where y L is the loop SNR and 5^ is the M -phase power loss. 5 ML has been 
evaluated by Lindsey and Simon (1973) for M - A and 8. 

Another method for carrier recovery in M-ary PSK is based on a 
generalization of the Costas loop. That method requires multiplying the 
received signal by M phase-shifted carriers of the form 


sin 



* = 1,2,... ,M 


lowpass-filtering each product, and then multiplying the outputs of the lowpass 
filters to generate the error signal. The error signal excites the loop filter, 
which, in turn, provides the control signal for the VCO. This method is 
relatively complex to implement and, consequently, has not been generally 
used in practice. 


Comparison of Decision-Directed with Non-Decision-Directed Loops 

We note that the decision-feedback phase-locked loop (DFPLL) differs from 
the Costas loop only in the method by which A(t) is rectified for the 
purpose of removing the modulation. In the Costas loop, each of the two 
quadrature signals used to rectify A(t) is corrupted by noise. In the DFPLL, 
only one of the signals used to rectify A(t) is corrupted by noise. On the 
other hand, the squaring loop is similar to the Costas loop in terms of the 
noise effect on the estimate 4>. Consequently, the DFPLL is superior in 
performance to both the Costas loop and the squaring loop, provided that 
the demodulator is operating at error rates below 10~ 2 where an occasional 
decision error has a negligible effect on 4>. Quantitative comparisons of the 
variance of the phase errors in a Costas loop to those in a DFPLL have 
been made by Lindsey and Simon (1973), and show that the variance of the 
DFPLL is 4-10 times smaller for signal-to-noise ratios per bit above Odb. 


6-3 SYMBOL TIMING ESTIMATION 

In a digital communication system, the output of the demodulator must be 
sampled periodically at the symbol rate, at the precise sampling time instants 
t m = mT + r, where T is the symbol interval and T is a nominal time delay that 
accounts for the propagation time of the signal from the transmitter to the 
receiver. To perform this periodic sampling, we require a clock signal at the 
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receiver. The process of extracting such a clock signal at the receiver is usually 
called symbol synchronization or timing recovery. 

Timing recovery is one of the most critical functions that is performed at the 
receiver of a synchronous digital communication system. We should nole that 
the receiver must know not only the frequency ( 1/7') at which the outputs of 
the matched filters or correlators are sampled, but also where to take the 
samples within each symbol interval. The choice of sampling instant within the 
symbol interval of duration T is called the timing phase. 

Symbol synchronization can be accomplished in one of several ways. In 
some communication systems, the transmitter and receiver clocks are syn- 
chronized to a master clock, which provides a very precise timing signal. In this 
case, the receiver must estimate and compensate for the relative time delay 
between the transmitted and received signals. Such may be the case for radio 
communication systems that operate in the very low frequency (VLF) band 
(below 30 kHz), where precise clock signals are transmitted from a master 
radio station. 

Another method for achieving symbol synchronization is for the transmitter 
to simultaneously transmit the clock frequency 1/7' or a multiple of 1 IT along 
with the information signal. The receiver may simply employ a narrowband 
filter tuned to the transmitted clock frequency and, thus, extract the clock 
signal for sampling. This approach has the advantage of being simple to 
implement. There are several disadvantages, however. One is that the 
transmitter must allocate some of its available power to the transmission of the 
clock signal. Another is that some small fraction of the available channel 
bandwidth must be allocated for the transmission of the clock signal. In spite of 
these disadvantages, this method is frequently used in telephone transmission 
systems that employ large bandwidths to transmit the signals of many users. In 
such a case, the transmission of a clock signal is shared in the demodulation of 
the signals among the many users. Through this shared use of the clock signal, 
the penalty in transmitter power and in bandwidth allocation is reduced 
proportionally by the number of users. 

A clock signal can also be extracted from the received data signal. There are 
a number of different methods that can be used at the receiver to achieve 
self-synchronization. In this section, we treat both decision-directed and 
non-decision-directed methods. 


6-3-1 Maximum-Likelihood Timing Estimation 

Let us begin by obtaining the ML estimate of the time delay r. If the signal is a 
baseband PAM waveform, it is represented as 

r(/) = .v(f;r) + «(/) (6-3-1) 

where 

v(t; r) -V l»gU ~ nT - r) (032) 
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FIGURE 6-3-1 


As in the case of ML phase estimation, we distinguish between two types of 
timing estimators, decision-directed timing estimators and non-decision- 
directed estimators. In the former, the information symbols from the output of 
the demodulator are treated as the known transmitted sequence. In this case, 
the log-likelihood function has the form 

A/.(f) ~ Cl ( r(i)s(t; r) dt (6-3-3) 

J T„ 

If we substitute (6-3-2) into (6-3-3), we obtain 

Mr) = C L 'Zl n \ r(t)g(t —nT — r ) dt 

n Jr„ 

= C L ^I n y n ( r) (6-3-4) 


where y„(t) is defined as 


y*{r) = r(t)g(t - nT-z ) dt (6-3-5) 

} T a 

A necessary condition for f to be the ML estimate of t is that 

= 'Zlnj T [y n (r)} = 0 (6-3-6) 


The result in (6-3-6) suggests the implementation of the tracking loop shown 
in Fig. 6-3-1. We should observe that the summation in the loop serves as the 
loop filter whose bandwidth is controlled by the length of the sliding window in 
the summation. The output of the loop filter drives the voltage -controlled clock 
(VCC), or voltage -controlled oscillator, which controls the sampling times for 
the input to the loop. Since the detected information sequence {/„} is used in 
the estimation of t, the estimate is decision-directed. 

The techniques described above for ML timing estimation of baseband 


Decision-directed ML estimation of timing for baseband PAM. 
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PAM signals can be extended to carrier modulated signal formats such as 
QAM and PSK in a straightforward manner, by dealing with the equivalent 
lowpass form of the signals. Thus, the problem of ML estimation of symbol 
timing for carrier signals is very similar to the problem formulation for the 
baseband PAM signal. 


6-3-2 Non-Decision-Directed Timing Estimation 

A non-decision-directed timing estimate can be obtained by averaging the 
likelihood ratio A(r) over the pdf of the information symbols, to obtain A(r), 
and then differentiating either A(r) or In A(r) = A / .(r) to obtain the condition 
for the maximum-likelihood estimate f ML . 

In the case of binary (baseband) PAM, where I„~± 1 with equal prob- 
ability, the average over the data yields 

Az.(t) - X In cosh Cy„(x) (6-3-7) 

n 

just as in the case of the phase estimator, Since In cosh x = \x 2 for small x, the 
square-law approximation 

Ac(t)-^C 2 2>'«(t) (6-3-8) 

n 

is appropriate for low signal-to-noise ratios. For multilevel PAM, we may 
approximate the statistical characteristics of the information symbols {/„} by 
the gaussian pdf, with zero mean and unit variance. When we average A(r) 
over the gaussian pdf, the logarithm of A(r) is identical to A l (t) given by 
(6-3-8). Consequently, the non-decision-directed estimate of r may be obtained 
by differentiating (6-3-8). The result is an approximation to the ML estimate of 
the delay time. The derivative of (6-3-8) is 

It (6-3-9) 

where y„(r) is given by (6-3-5). 

An implementation of a tracking loop based on the derivative of A L (r) 
given by (6-3-7) is shown in Fig. 6-3-2. Alternatively, an implementation of a 


FIGURE 6-3-2 Non decision-directed estimation of timing for binary baseband PAM. 
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FIGURE 6-3-3 


FIGURE 6-3-4 



tracking loop based on (6-3-9) is illustrated in Fig. 6-3-3. In both structures, we 
observe that the summation serves as the loop filter that drives the VCC. It is 
interesting to note the resemblance of the timing loop in Fig. 6-3-3 to the 
Costas loop for phase estimation. 

Early-Late Gate Synchronizers Another non-decision-directed timing es- 
timator exploits the symmetry properties of the signal at the output of the 
matched filter or correlator. To describe this method, let us consider the 
rectangular pulse s(r), 0 « t «= T, shown in Fig. 6-3-4(a). The output of the filter 
matched to s(f) attains its maximum value at time t = T, as shown in Fig. 
6-3-4(6). Thus, the output of the matched filter is the time autocorrelation 
function of the pulse j(r). Of course, this statement holds for any arbitrary 
pulse shape, so the approach that we describe applies in general to any signal 
pulse. Clearly, the proper time to sample the output of the matched filter for a 
maximum output is at t - T, i.e.. at the peak of the correlation function. 

In the presence of noise, the identification of the peak value of the signal is 
generally difficult. Instead of sampling the signal at the peak, suppose we 
sample early, at t=T - 8 and late at t = T + 8. The absolute values of the 
early samples |y(m(r-'S))| and the late samples |y(m(T + S))| will be smaller 
(on the average in the presence of noise) than the samples of the peak value 
\y(mT)\. Since the autocorrelation function is even with respecl to the 
optimum sampling time t = T, the absolute values of the correlation function at 
t = T - 8 and t = T + 8 are equal. Under this condition, the proper sampling 
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matched filter output (6|. 
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FIGURE 6-3-5 
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Block diagram of early-late gate synchronizer. 


time is the midpoint between t = T — 8 and t = T + 8. This condition forms the 
basis for the early-late gate symbol synchronizer. 

Figure 6-3-5 illustrates the block diagram of an early-late gate synchronizer. 
In this figure, correlators are used in place of the equivalent matched filters. 
The two correlators integrate over the symbol interval T, but one correlator 
starts integrating 8 seconds early relative to the estimated optimum sampling 
time and the other integrator starts integrating 5 seconds late relative to the 
estimated optimum sampling time. An error signal is formed by taking the 
difference between the absolute values of the two correlator outputs. To 
smooth the noise corrupting the signal samples, the error signal is passed 
through a lowpass filter. If the timing is off relative to the optimum sampling 
time, the average error signal at the output of the lowpass filter is nonzero, and 
the clock signal is either retarded or advanced, depending on the sign of the 
error. Thus, the smoothed error signal is used to drive a voltage-controlled 
clock (VCC), whose output is the desired clock signal that is used for sampling. 
The output of the VCC is also used as a clock signal for a symbol waveform 
generator that puts out the same basic pulse waveform as that of the 
transmitting filter. This pulse waveform is advanced and delayed and then fed 
to the two correlators, as shown in Fig. 6-3-5. Note that if the signal pulses are 
rectangular, there is no need for a signal pulse generator within the tracking 
loop. 

We observe that the early-late gate synchronizer is basically a closed-loop 
control system whose bandwidth is relatively narrow compared to the symbol 
rate 1/7'. The bandwidth of the loop determines the quality of the timing 
estimate. A narrowband loop provides more averaging over the additive noise 
and, thus, improves the quality of the estimated sampling instants, provided 
that the channel propagation delay is constant and the clock oscillator at the 
transmitter is not drifting with time (or drifting very slowly with time). On the 
other hand, if the channel propagation delay is changing with time and/or the 
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FIGURE 6-3-6 



Block diagram of early-late gate synchronizer — an alternative form. 


transmitter clock is also drifting with time then the bandwidth of the loop must 
be increased to provide for faster tracking of time variations in symbol timing. 

In the tracking mode, the two correlators are affected by adjacent symbols. 
However, if the sequence of information symbols has zero mean, as is the case 
for PAM and some other signal modulations, the contribution to the output of 
the correlators from adjacent symbols averages out to zero in the lowpass filter. 

An equivalent realization of the early-late gate synchronizer that is 
somewhat easier to implement is shown in Fig. 6-3-6. In this case the clock 
signal from the VCC is advanced and delayed by 8, and these clock signals are 
used to sample the outputs of the two correlators. 

The early-late gate synchronizer described above is a non-decision-directed 
estimator of symbol timing that approximates the maximum-likelihood es- 
timator. This assertion can be demonstrated by approximating the derivative of 
the log-likelihood function by the finite difference, i.e.. 


d\ L {x) \ L (x + 8) - \ L (x ~ 8) 
dz 28 


(6-3-10) 


If we substitute for A L (z) from (6-3-8) into (6-3-10), we obtain the approxima- 
tion for the derivative as 


d\ L (t) 

dz 


= ^II^+8)-y5(t-«)] 





(6-3-11) 
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But the mathematical expression in (6-3-11) basically describes the functions 
performed by the early-late gate symbol synchronizers illustrated in Figs 6-3-5 
and 6-3-6. 


6-4 JOINT ESTIMATION OF CARRIER PHASE 
AND SYMBOL TIMING 

The estimation of the carrier phase and symbol timing may be accomplished 
separately as described above or jointly. Joint ML estimation of two or more 
signal parameters yields estimates that are as good and usually better than the 
estimates obtained from separate optimization of the likelihood function. In 
other words, the variances of the signal parameters obtained from joint 
optimization are less than or equal to the variance of parameter estimates 
obtained from separately optimizing the likelihood function. 

Let us consider the joint estimation of the carrier phase and symbol timing. 
The log-likelihood function for these two parameters may be expressed in 
terms of the equivalent lowpass signals as 


A d<t>. *) = Re 


if 

v k 


r(f )$/*(?; tb, r) dt 


(6-4-1) 


where s,(i\ 4>, r) is the equivalent lowpass signal, which has the general form 


Sj(t: <b, t) = e'^X nT - x) + j X J n w(t - nT - r) 


(6-4-2) 


where {/„} and {/„} are the two information sequences. 

We note that, for PAM, we may set /„ = 0 for all n, and the sequence {/„} is 
real. For QAM and PSK, we set J„ — 0 for all n and the sequence {/„} is 
complex-valued. For offset QPSK, both sequences {/„} and {J„} are nonzero 
and w(t) = g(t - ±T). 

For decision -directed ML estimation of <f> and r, the log-likelihood function 
becomes 


where 


A L (d>, r) - Re 


J* 


K 


X [/*V„(T) +jj*x„( T)] 


(6-4-3) 


y„(t) - f r(t)g*(t - nT ~ r) dt 

(6-4-4) 

JC «( T ) = [ r(t)w*(t - nT - x) dt 


Necessary conditions for the estimates of <f> and T to be the ML estimates are 


dA L (d>, r) 
d<f> 


= 0, 


dA L (<j>, r) 


= 0 


dx 


(6-4-5) 
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FIGURE 6-1-1 


It is convenient to define 


A( r) +jB( r) = ~ 2 [I*y„(r) + jJ*x n (r)} 

■'Vo n 

With this definition, (6-4-3) may be expressed in the simple form 
A l ( 4>, r)-A(r) cos - B( t) sin <f> 

Now the conditions in (6-4-5) for the joint ML estimates become 
dA (<f>, r) 

= ~A(r) sin 4> - B(r) cos 4> = 0 


d<j) 

3 A (<t>, r) M(t) 


dr 


From (6-4-8), we obtain 


dr 


dB(r) 

cos <ft sin = 0 

dr 


4 * ML — f arl 


-i 

- A(?ml)- 


The solution to (6-4-9) that incorporates (6-4-10) is 


Mr) 


MtI) + s(r) ML> 

dr dr 


= 0 


*=r M L 


(6-4-6) 

(6-4-7) 

(6-4-8) 

(6-4-9) 

(6-4-10) 

(6-4-11) 


The decision-directed tracking loop for QAM (or PSK) obtained from these 
equations is illustrated in Fig. 6-4-1. 


Decision-directed joint tracking loop for carrier phase and symbol timing in QAM and PSK. 
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Offset QPSK requires a slightly more complex structure for joint estimation 
of <t> and r. The structure is easily derived from (6-4-6)— (6-4-1 1 ). 

In addition to the joint estimates given above, it is also possible to derive 
non-decision-directed estimates of the carrier phase and symbol timing, 
although we shall not pursue this approach. 

We should also mention that one can combine the parameter estimation 
problem with the demodulation of the information sequence {/„}. Thus, one 
can consider the joint maximum-likelihood estimation of {/„}, the carrier phase 
4>, and the symbol timing parameter r. Results on these joint estimation 
problems have appeared in the technical literature, e.g. Kobayashi (1971), 
Falconer (1976), and Falconer and Salz (1977). 


6-5 PERFORMANCE CHARACTERISTICS OF ML 
ESTIMATORS 

The quality of a signal parameter estimate is usually measured in terms of its 
bias and its variance. In order to define these terms, let us assume that we have 
a sequence of observations [*, ,r 2 ... jc„J = x, with pdf p(\ | 0), from 

which we extract an estimate of a parameter <t>. The bias of an estimate, say 
4>{x), is defined as 

bias = £[$(x)] - <t> (6-5-1) 

where 4> is the true value of the parameter. When £[<f>(x)] = <f>, we say that the 
estimate is unbiased. The variance of the estimate <f>(x) is defined as 

= £{[ 4 (*)] 2 } -{£(<£ ( x )]} 2 ( 6 - 5 - 2 ) 

In general cr\ may be difficult to compute. However, a well-known result in 
parameter estimation (see Helstrom, 1968) is the Cramdr-Rao lower bound on 
the mean square error defined as 


£{[<£(») - (j)] 2 }^ [^-£[<£(x)]] j £{J^ln/?(x | 4>) 


(6-5-3) 


Note that when the estimate is unbiased, the numerator of (6-5-3) is unity 
and the bound becomes a lower bound on the variance a\ of the estimate 
$(x), i.e., 






(6-5-4) 


Since ln/?(x | <f>) differs from the log-likelihood function by a constant factor 



368 OK.il I Al. I'OMMl'NltATlONS 


independent of (f > , it follows that 

£ {[A np( ‘H)‘ £ {[A" AW ]: 

" _£ {a^ lnA(0) } (6 ' 5 ' 5) 

Therefore, the lower bound on the variance is 

* 1 Alt A n Hi = - A[ A ,n H. <6 - 5 - 6) 

This lower bound is a very useful result. It provides a benchmark for 
comparing the variance of any practical estimate to the lower bound. Any 
estimate that is unbiased and whose variance attains the lower bound is called 
an efficient estimate, 

In general, efficient estimates are rare. When they exist, they are maximum- 
likelihood estimates. A well-known result from parameter estimation theory is 
that any ML parameter estimate is asymptotically (arbitrarily large number of 
observations) unbiased and efficient. To a large extent, these desirable 
properties constitute the importance of ML parameter estimates. It also known 
that an ML estimate is asymptotically gaussian-distributed [with mean <f> and 
variance equal to the lower bound given by (6-5-6).] 

In the case of the ML estimates described in this chapter for the two signal 
parameters, their variance is generally inversely proportional to the signal-to- 
noise ratio, or, equivalently, inversely proportional to the signal power 
multiplied by the observation interval T 0 . Furthermore, the variance of the 
decision-directed estimates, at low error probabilities, are generally lower than 
the variance of non-decision-directed estimates. In fact, the performance of the 
ML decision-directed estimates for <f> and r attain the lower bound. 

The following example is concerned with the evaluation of the Cramdr-Rao 
lower bound for the ML estimate of the carrier phase. 

Example 6-5-1 

The ML estimate of the phase of an unmodulated carrier was shown in 

(6-2-11) to satisfy the condition 

r(t) sin (2 rtf v t + <£ ML ) dr = 0 (6-5-7) 

j t„ 

where 

r(t) = s(t \4>)+ n(t) 

= Acos(2n£f + <f>) + n(t) (6-5-8) 

The condition in (6-5-7) was derived by maximizing the log likelihood function 

a l(0) = ~ [ r(t)s(t: <(>) dt 

/v o J T, 


(6-5-9) 
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FIGURE 6-5-1 


The variance of <£ ML is lower-bounded as 

J r Elr(t)]co$(2nf c r + <£)*} 



The factor l/2T 0 is simply the (one-sided) equivalent noise bandwidth of the 
ideal integrator. 

From this example, we observe that the variance of the ML phase estimate 
is lower-bounded as 

(6-5-n) 

where y L = A 2 /2N 0 B eil is the loop SNR. This is also the variance obtained for 
the phase estimate from a PLL with decision-directed estimation. As we have 
already observed, non-decision-directed estimates do not perform as well due 
to losses in the nonlinearities required to remove the modulation, e.g., the 
squaring loss and the A/th-power loss. 

Similar results can be obtained on the quality of the symbol timing estimates 
derived above. In addition to their dependence on the SNR, the quality of 
symbol timing estimates is a function of the signal pulse shape. For example, a 
pulse shape that is commonly used in practice is one that has a raised cosine 
spectrum (see Section 9-2). For such a pulse, the rms timing error (o>) as a 
function of SNR is illustrated in Fig. 6-5-1, for both decision-directed and 


Performance of baseband symbol timing estimate for 
fixed signal and loop bandwidths. [From 
Synchronization Subsystems: Analysis and Design, 
by L Franks, 1 983. Reprinted with permission of 
the author.] 




370 DIGITAL COMMUNICATIONS 


FIGURE 6-5-2 Performance of baseband symbol liming estimate for fixed 
SNR and fixed loop bandwidth. [From Synchronization 
Subsystems: Analysis and Design, by L. Franks, 1983. 
Reprinted with permission of the author ] 
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non-decision-directed estimates. Note the significant improvement in 
performance of the decision-directed estimate compared with the non-decision- 
directed estimate. Now, if the bandwidth of the pulse is varied, the pulse shape 
is changed and, hence, the rms value of the timing error also changes. For 
example, when the bandwidth of the pulse that has a raised cosine spectrum is 
varied, the rms timing error varies as shown in Fig. 6-5-2. Note that the error 
decreases as the bandwidth of the pulse increases. 

In conclusion, we have presented the ML method for signal parameter 
estimation and have applied it to the estimation of the carrier phase and 
symbol timing. We have also described their performance characteristics. 

6-6 BIBLIOGRAPHICAL NOTES AND REFERENCES 

Carrier recovery and timing synchronization are two topics that have been 
thoroughly investigated over the past three decades. The Costas loop was 
invented in 1956 and the decision-directed phase estimation methods were 
described by Proakis el al. (1964) and by Natali and Walbesser (1969). The 
work on decision-directed estimation was motivated by earlier work of Price 
(1962a, b). Comprehensive treatments of phase-locked loops first appeared in 
the books by Viterbi (1966) and Gardner (1979). Books that cover carrier 
phase recovery and time synchronization techniques have been written by 
Stiffler (1971), Lindsey (1972), Lindsey and Simon (1973), and Meyr and 
Ascheid (1990). 

A number of tutorial papers have appeared in IEEE journals on the PLL 
and on time synchronization. We cite, for example, the paper by Gupta (1975), 
which treats both analog and digital implementation of PLLs, and the paper by 
Lindsey and Chie (1981), which is devoted to the analysis of digital PLLs. In 
addition, the tutorial paper by Franks (1980) describes both carrier phase and 
symbol synchronization methods, including methods based on the maximum- 
likelihood estimation criterion. The paper by Franks is contained in a special 
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issue of the IEEE Transactions on Communications (August 1980) devoted to 
synchronization. The paper by Mueller and Muller (1976) describes digital 
signal processing algorithms for extracting symbol timing. 

Application of the maximum-likelihood criterion to parameter estimation 
was first described in the context of radar parameter estimation (range and 
range rate). Subsequently, this optimal criterion was applied to carrier phase 
and symbol timing estimation as well as to joint parameter estimation with data 
symbols. Papers on these topics have been published by several researchers, 
including Falconer (1976), Mengali (1977), Falconer and Salz (1977), and 
Meyers and Franks (1980). 

The Cramdr-Rao lower bound on the variance of a parameter estimate is 
derived and evaluated in a number of standard texts on detection and 
estimation theory, such as Helstrom (1968) and Van Trees (1968). It is also 
described in several books on mathematical statistics, such as the book by 
Cramer (1946). 


PROBLEMS 

6-1 Prove the relation (6-1-7). 

6-2 Sketch the equivalent realization of the binary PSK receiver in Fig. 6-1-1 that 
employs a matched filter instead of a correlator. 

6-3 Suppose that the loop filter [see (6-2-14)] for a PLL has the transfer function 


G(*)« 


_J 

s + V2 


a Determine the closed-loop transfer function H(s) and indicate if the loop is 
stable. 

b Determine the damping factor and the natural frequency of the loop. 

6-4 Consider the PLL for estimating the carrier phase of a signal in which the loop 
filter is specified as 


a Determine the closed-loop transfer function H(s) and its gain at / = 0. 
b For what range of values of r, and K is the loop stable? 

6-5 The loop filter G(s) in a PLL is implemented by the circuit shown in Fig. P6-5. 
Determine the system function G(s) and express the time constants r, and t 2 in 
terms of the circuit parameters. 
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-m- 


Inpul 


Output 


FIGURE P6-5 


o 


■o 



372 DIGITAL COMMUNICATIONS 


FIGURE P6-6 
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6-6 The loop filter G(s) in a PLL is implemented with the active filter shown in Fig. 
P6-6. Determine the system function G(s) and express the time constants r, and r 2 
in terms of the circuit parameters. 

6-7 Show that the early-late gate synchronizer illustrated in Fig. 6-3-5 is a close 
approximation to the timing recovery system illustrated in Fig. P6-7. 

6-8 Based on a ML criterion, determine a carrier phase estimation method for binary 
on-off keying modulation. 

6-9 In the transmission and reception of signals to and from moving vehicles, the 
transmitted signal frequency is shifted in direct proportion to the speed of the 
vehicle. The so-called Doppler frequency shift imparted to a signal that is received 
in a vehicle traveling at a velocity v relative to a (fixed) transmitter is given by the 
formula 


where A is the wavelength, and the sign depends on the direction (moving toward 
or moving away) that the vehicle is traveling relative to the transmitter. Suppose 
that a vehicle is traveling at a speed of 100 km/h relative to a base station in 
a mobile cellular communication system. The signal is a narrowband signal 
transmitted at a carrier frequency of 1 GHz. 
a Determine the Doppler frequency shift. 

b What should be the bandwidth of a Doppler frequency tracking loop if the loop 
is designed to track Doppler frequency shifts for vehicles traveling at speeds up 
to 100 km/h? 

c Suppose the transmitted signal bandwidth is 2 MHz centered at 1 GHz. 


FIGURE P6-7 
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Determine the Doppler frequency spread between the upper and lower 
frequencies in the signal. 

6-10 Show that the mean value of the ML estimate in (6-2-38) is 4>, i.e., that the 
estimate is unbiased. 

6-11 Determine the pdf of the ML phase estimate in (6-2-38). 

6-12 Determine the ML phase estimate for offset OPSK. 

6-13 A single-sideband PAM signal may be represented as 

u,„(r) = A„\g,(t) cos 2nft - g r {t) sin 2nfj) 

where g, (i) is the Hilbert transform of g,(t) and A,„ is the amplitude level that 
conveys the information. Demonstrate mathematically that a Costas loop can be 
used to demodulate the SSB PAM signal. 

6-14 A carrier component is transmitted on the quadrature carrier in a communication 
system that transmits information via binary PSK Hence, the received signal has 
the form 

r(t) = ±V2P, cos(2;r/ +<(>) + V2^sin (2 rtf, +<!>) + n(i) 

where <f> is the carrier phase and n(t) is AWGN. The unmodulated carrier 
component is used as a pilot signal at the receiver to estimate the carrier phase, 
a Sketch a block diagram of the receiver, including the carrier phase estimator, 
b Illustrate mathematically the operations involved in the estimation of the carrier 
phase 4>. 

c Express the probability of error for the detection of the binary PSK signal as a 
function of the total transmitted power P T = P s + P c . What is the loss in 
performance due to the allocation of a portion of the transmitted power to the 
pilot signal? Evaluate the loss for PJP T =0.1. 

6-15 Determine the signal and noise components at the input to a fourth-power ( M = 4) 
PLL that is used to generate the carrier phase for demodulation of QPSK. By 
ignoring all noise components except those that are linear in the noise n(/), 
determine the variance of the phase estimate at the output of the PLL. 

6-16 The probability of error for binary PSK demodulation and detection when there is 
a carrier phase error d»,, is 

= q{ -\J ^ cos2 &■) 

Suppose that the phase error from the PLL is modeled as a zero-mean gaussian 
random variable with variance <r* « n. Determine the expression for the average 
probability of error (in integral form). 
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CHANNEL CAPACITY 
AND CODING 


In Chapter 5, we considered the problem of digital modulation by means of 
M - 2* signal waveforms, where each waveform conveys k bits of information. 
We observed that some modulation methods provide better performance than 
others. In particular, we demonstrated that orthogonal signaling waveforms 
allow us to make the probability of error arbitrarily small by letting the 
number of waveforms °c, provided that the SNR per bit y b s* -1.6 dB. 
Thus, we can operate at the capacity of the additive, white gaussian noise 
channel in the limit as the bandwidth expansion factor B e = W/R-+&. This is 
a heavy price to pay, because B e grows exponentially with the block length k. 
Such inefficient use of channel bandwidth is highly undesirable. 

In this and the following chapter, we consider signal waveforms generated 
from either binary or nonbinary sequences. The resulting waveforms are 
generally characterized by a bandwidth expansion factor that grows only 
linearly with k. Consequently, coded waveforms offer the potential for greater 
bandwidth efficiency than orthogonal Af-ary waveforms. We shall observe that, 
in general, coded waveforms offer performance advantages not only in 
power-limited applications where RIW < 1, but also in bandwidth-limited 
systems where R/W > 1. 

We begin by establishing several channel models that will be used to 
evaluate the benefits of channel coding, and we shall introduce the concept of 
channel capacity for the various channel models. Then, we treat the subject of 
code design for efficient communications. 


374 
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7-1 CHANNEL MODELS AND CHANNEL CAPACITY 

In the model of a digital communication system described in Section 1-1, we 
recall that the transmitter building blocks consist of the discrete-input, 
discrete-output channel encoder followed by the modulator. The function of 
the discrete channel encoder is to introduce, in a controlled manner, some 
redundancy in the binary information sequence, which can be used at the 
receiver to overcome the effects of noise and interference encountered in the 
transmission of the signal through the channel. The encoding process generally 
involves taking k information bits at a time and mapping each /c-bit sequence 
into a unique n-bit sequence, called a code word. The amount of redundancy 
introduced by the encoding of the data in this manner is measured by the ratio 
n/k. The reciprocal of this ratio, namely k/n, is called the code rate. 

The binary sequence at the output of the channel encoder is fed to the 
modulator, which serves as the interface to the communication channel. As we 
have discussed, the modulator may simply map each binary digit into one of 
two possible waveforms, i.e., a 0 is mapped into s,(r) and a 1 is mapped into 
s 2 (r). Alternatively, the modulator may transmit g-bit blocks at a time by using 
M = 2 q possible waveforms. 

At the receiving end of the digital communication system, the demodulator 
processes the channel-corrupted waveform and reduces each waveform to a 
scalar or a vector that represents an estimate of the transmitted data symbol 
(binary or Af-ary). The detector, which follows the demodulator, may decide 
on whether the transmitted bit is a 0 or a 1. In such a case, the detector has 
made a hard decision. If we view the decision process at the detector as a form 
of quantization, we observe that a hard decision corresponds to binary 
quantization of the demodulator output. More generally, we may consider a 
detector that quantizes to Q > 2 levels, i.e., a Q-ary detector. If Af-ary signals 
are used then Q^M. In the extreme case when no quantization is performed, 
Q - 00 . In the case where Q> M, we say that the detector has made a soft 
decision. 

The quantized output from the detector is then fed to the channel decoder, 
which exploits the available redundancy to correct for channel disturbances. 

In the following sections, we describe three channel models that will be used 
to establish the maximum achievable bit rate for the channel. 


7-1-1 Channel Models 

In this section we describe channel models that will be useful in the design of 
codes. The simplest is the binary symmetric channel (BSC), which corresponds 
to the case with M = 2 and hard decisions at the detector. 


Binary Symmetric Channel Let us consider an additive noise channel and 
let the modulator and the demodulator/detector be included as parts of the 
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FIGURE 7-1-1 



A composite discrete-input, discrete-output channel formed by including the modulator and the 
demodulator/detector as part of the channel. 


channel. If the modulator employs binary waveforms and the detector makes 
hard decisions, then the composite channel, shown in Fig. 7-1-1, has a 
discrete-time binary input sequence and a discrete-time binary output 
sequence. Such a composite channel is characterized by the set X ={0, 1} of 
possible inputs, the set of y = {0, 1} of possible outputs, and a set of 
conditional probabilities that relate the possible outputs to the possible inputs. 
If the channel noise and other disturbances cause statistically independent 
errors in the transmitted binary sequence with average probability p then 

P(Y = 0 | X = 1) = P(Y = 1 j X = 0) =p 

(7-1-1) 

P(Y=1 = = 0|* = 0) = 1 ~p 

Thus, we have reduced the cascade of the binary modulator, the waveform 
channel, and the binary demodulator and detector into an equivalent discrete- 
time channel which is represented by the diagram shown in Fig. 7-1-2. This 
binary-input, binary-output, symmetric channel is simply called a binary 
symmetric channel (BSC). Since each output bit from the channel depends only 
on the corresponding input bit, we say that the channel is memoryless. 


Discrete Memory! ess Channels The BSC is a special case of a more 
general discrete-input, discrete-output channel. Suppose that the output from 
the channel encoder are q-ary symbols, i.e., X = {x 0 , x u . . , , and the 
output of the detector consists of Q - ary symbols, where Q^M = 2 q . If the 



FIGURE 7-1-2 Binary symmetric channel. 
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FIGURE 7-1-3 


Discrete q - ary input, ()-ary output channel. 



channel and the modulation are memoryless, then the input-output 
characteristics of the composite channel, shown in Fig. 7-1-1, are described by 
a set of qQ conditional probabilities 

P(Y = yi \X = Xj)^P(y,\x t ) (7-1-2) 


where / = 0, 1, . . . , Q - 1 and j ~ 0, 1 q - 1. Such a channel is called a 

dicrete memoryless channel (DMC), and its graphical representation is shown 
in Fig. 7-1-3. Hence, if the input to a DMC is a sequence of n symbols 
u u u 2 , . . . , u n selected from the alphabet X and the corresponding output is 
the sequence v u v 2 ,...,v n of symbols from the alphabet Y, the joint 
conditional probability is 


P{Y ] = v u Y 2 = v 2 , . . . , Y r = v n | X =u u . . . , X = u„) 


n 


= FI F(F = v*j* = u*) 

k = 1 


( 7 - 1 - 3 ) 


This expression is simply a mathematical statement of the memoryless 
condition. 

In general, the conditional probabilities {^(y; | *,)} that characterize a DMC 
can be arranged in the matrix form P = [p y ,j, where, by definition. 
Pi, — P(y, | x,). P is called the probability transition matrix for the channel. 


Discrete-Input, Continuous-Output Channel Now, suppose that the input 
to the modulator comprises symbols selected from a finite and discrete input 
alphabet X = {*„, *i, • ■ , x q - x ] and the output of the detector is unquantized 
(Q = *). Then, the input to the channel decoder can assume any value on the 
real line, i.e., Y— {- x , °°}. This leads us to define a composite discrete-time 
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memoryless channel that is characterized by the discrete input X, the 
continuous output Y, and the set of conditional probability density functions 

p(y\X=x k ), k = 0, 1, .. . , q - 1 

The most important channel of this type is the additive white gaussian noise 
channel (AWGN), for which 

Y = X + G (7-1-4) 

where G is a zero-mean gaussian random variable with variance a 1 and 
X = x k , k = 0, 1, . . . , q — 1. For a given X, it follows that Y is gaussian with 
mean x k and variance tr 2 . That is, 

p(y I X =**) = ~^- e - iy - x ^ /z,r2 (7-1-5) 

For any given input sequence, X„ i = 1,2, ... , n, there is a corresponding 
output sequence 

Yj = X,- + G h i = 1, 2 n (7-1-6) 

The condition that the channel is memoryless may be expressed as 

» 

n 

p(yi> yz> • • • >y n \ X\ — u u x 2 = u 2 , . . . , x n = »„) = p(y t | x , = u t ) 

i~ I 

(7-1-7) 

Waveform Channels We may separate the modulator and demodulator 
from the physical channel, and consider a channel model in which the inputs 
are waveforms and the outputs are waveforms. Let us assume that such a 
channel has a given bandwidth W, with ideal frequency response C(/) = 1 
within the bandwidth W, and the signal at its output is corrupted by additive 
white gaussian noise. Suppose, that x(t) is a band-limited input to such a 
channel and y(t) is the corresponding output. Then, 

y(0=*(r) + «(0 (7-1-8) 

where n(i) represents a sample function of the additive noise process. A 
suitable method for defining a set of probabilities that characterize the channel 
is to expand x(t), y(t), and n(t) into a complete set of orthonormal functions. 
That is, we express x(t), y(t), and «(f) in the form 

y(0 = S yJAt) 

i 

x(‘) = X 

i 

"(0 = X n,fXt) 

i 


(7-1-9) 
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where {>>,}, {.*,}, and [n,} are the sets of coefficients in the corresponding 
expansions, e.g., 

y. = [ dt 

■A) 

= \ T [x(t) + n{t)]fr(t)di 

= x,+n, (7-1-10) 

The functions {^(r)} form a complete orthonormal set over the interval 
(0, T), i.e., 

where 8 t) is the Kronecker delta function. Since the gaussian noise is white, any 
complete set of orthonormal functions may be used in the expansions (7-1-9). 

We may now use the coefficients in the expansion for characterizing the 
channel. Since 


y, = x, + n, 

where n, is gaussian, it follows that 

P^‘\ x ^ = '^^ e(y, ~ X ' 9l2,r '' 1 = 1,2,... (7-1-121 

Since the functions {/(f)} in the expansion are orthonormal, it follows that the 
{«,} are uncorrelated. Since they are gaussian, they are also statistically 
independent. Hence, 

iV 

p(y o >'2 y N l*i. • • • ,x N ) = np(y, U) (7-i-i3) 

/- J 

for any N. In this manner, the waveform channel is reduced to an equvalent 
discrete-time channel characterized by the conditional pdf given in (7-1- 12). 

When the additive noise is white and gaussian with spectral density }/V 0 , the 
variances arf = jA}, for all i in (7-1-12). In this case, samples of v(r) and y(r) 
may be taken at the Nyquist rate of 2W samples/s, so that jr, = x{H2W) and 
y, -y(i/2W). Since the noise is white, the noise samples are statistically 
independent. Thus, (7-1-12) and (7-1-13) describe the statistics of the sampled 
signal. We note that in a time interval of length 7, there are N = 2 WT samples. 
This parameter is used below in obtaining the capacity of the band-limited 
AWGN waveform channel. 

The choice of which channel model to use at any one time depends on our 
objectives. If we are interested in the design and analysis of the performance 



380 DIGITAL COMMUNICATIONS 


of the discrete channel encoder and decoder, it is appropriate to consider 
channel models in which the modulator and demodulator are a part of the 
composite channel. On the other hand, if our intent is to design and analyze 
the performance of the digital modulator and digital demodulator, we use a 
channel model for the waveform channel. 


7-1-2 Channel Capacity 

Now let us consider a DM C having an input alphabet X = {x 0 , x u . . . , x q - x }, 
an output alphabet Y = {y 0 , y u . . . , y Q - { }, and the set of transition prob- 
abilities P(y t | Xj) as defined in (7-1-2). Suppose that the symbol x j is 
transmitted and the symbol y, is received. The mutual information provided 
about . the event X = x, by the occurrence of the event Y = y t is 
log [P(y, j Xj)/P(yi)], where 

P(yd - P(Y - y.) = P(x k )P( yi | x k ) (7-1-14) 

k =0 

Hence, the average mutual information provided by the output Y about the 
input X is 

/<*; Y) = £ P(x, )P(y, | x,) log (7-1-15) 

The channel characteristics determine the transition probabilities P{y t \ jc,), 
but the probabilities of the input symbols are under the control of the discrete 
channel encoder. The value of I(X ; T) maximized over the set of input symbol 
probabilities P(xj) is a quantity that depends only on the characteristics of the 
DMC through the conditional probabilities P{y t \ x,). This quantity is called the 
capacity of the channel and is denoted by C. That is, the capacity of a DMC is 
defined as 


C = maxJ(X; Y) 

= max ^ 2 P(x,)P( yi I X j) log (7- 1-16) 

P{X,) y = 0 i=0 P(y,) 

The maximization of I(X ; Y) is performed under the constraints that 


P(x,)> 0 



= 1 


The units of C are bits per input symbol into the channel (bits/channel use) 
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FIGURE 7-1-4 


FIGURE 7-1-5 


The capacity of a BSC as a function of the error 
probability p. 



when the logarithm is base 2, and nats/input symbol when the natural 
logarithm (base e ) is used. If a symbol enters the channel every r, seconds, the 
channel capacity in bits/s or nats/s is C/r,. 


Example 7-1-1 

For the BSC with transition probabilities 

P(0|l) = P(l|0)=p 

the average mutual information is maximized when the input probabilities 
P(0) = P( 1 ) = j. Thus, the capacity of the BSC is 

C log2p + (1 — p) log 2(1 -p) = \~H{p) (7-1-17) 

where H(p) is the binary entropy function. A plot of C versus p is 
illustrated in Fig. 7-1-4. Note that forp =0, the capacity is 1 bit/channel use. 
On the other hand, for p = the mutual information between input and 
output is zero. Hence, the channel capacity is zero. For \ <p ^ 1, we may 
reverse the position of 0 and 1 at the output of the BSC, so that C becomes 
symmetric with respect to the point p — \. In our treatment of binary 
modulation and demodulation given in Chapter 5, we showed that p is a 
monotonic function of the signal-to-noise ratio (SNR) as illustrated in Fig. 
7-1 -5(o). Consequently when C is plotted as a function of the SNR, it 
increases monotonically as the SNR increases. This characteristic behavior 
of C versus SNR is illustrated in Fig. 7-1 -5(b). 

Next let us consider the discrete-time AWGN memoryless channel de- 
scribed by the transition probability density functions defined by (7-1-5). The 


General behavior of error probability and channel capacity as a 
function of SNR. 
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average mutual information between the discrete input X = {jr 0 , *i , . • • , 
and the output y = {-«, *>} is given by the capacity of this channel in 
bits/channel use is 

C = max ^ f p{y\x i )P{x i )\og 2 ^^-dy (7-1-18) 

nx,\ i=0 J_» p(y) 

where 

P(y ) = 5 P(y I (7-1-19) 

k ~0 


Example 7-1-2 

Let us consider a binary-input AWGN memoryless channel with possible 
inputs X = A and X = -A. The average mutual information I(X;Y) is 
maximized when the input probabilities are P(X = A) = P(X = -A) - 
Hence, the capacity of this channel in bits/channel use is 

C = kf p(y\A))og 2 ^~^dy 

p{y) 

+ i [ p(y | ~ A -dy (7-1-20) 

p(y) 

Figure 7-1-6 illustrates C as a function of the ratio A 2 12a 1 . Note that C 
increases monotonically from 0 to 1 bit/symbol as this ratio increases. 

It is interesting to note that in the two channel models described above, the 
choice of equally probable input symbols maximizes the average mutual 
information. Thus, the capacity of the channel is obtained when the input 
symbols are equally probable. This is not always the solution for the capacity 
formulas given in (7-1-16) and (7-1-18), however. Nothing can be said in 
general about the input probability assignment that maximizes the average 
mutual information. However, in the two channel models considered above. 


FIGURE 7-1-6 


Channel capacity as a function of A*I2<t 2 for binary-input AWGN 
memoryless channel. 



!Olog(A 2 /2o J 1 (dB) 
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the channel transition probabilities exhibit a form of symmetry that results in 
the maximum of 1{X\ y) being obtained when the input symbols are equally 
probable. The symmetry condition can be expressed in terms of the elements 
of the probability transition matrix P of the channel. When each row of this 
matrix is a permutation of any other row and each column is a permutation of 
any other column, the probability transition matrix is symmetric and input 
symbols with equal probability maximize 1(X\ Y). 

In general, necessary and sufficient conditions for the set of input prob- 
abilities {P(Xj)} to maximize I(X\ Y) and, thus, to achieve capacity on a DMC 
are that (Problem 7-1) 


/(.*,; Y) = C for all j with P(x,) > 0 
/(*/, V) « C for all j with P(x t ) = 0 

where C is the capacity of the channel and 


/(*,-; >0= S' /*<*!*/) log 


I =0 


p (y, 1 */) 
p (y,) 


(7-1-21) 


(7-1-22) 


Usually, it is relatively easy to check if the equally probable set of input 
symbols satisfy the conditions (7-1-21). If they do not, then one must 
determine the set of unequal probabilities {/^x,)} that satisfy (7-1-21). 

Now let us consider a band-limited waveform channel with additive white 
gaussian noise. Formally, the capacity of the channel per unit time has been 
defined by Shannon (1948b) as 

C — lim max^,/(A r ; Y) (7-1-23) 

T ^oo p(x) I 


where the average mutual information /( X; Y) is given in (3-2-17). Alterna- 
tively, we may use the samples or the coefficients {y,}, {x,}, and {nj in the series 
expansions of y(r), x(t), and n{t), respectively, to determine the average 
mutual information between x N = [x, x 2 . . . x*] and y N = [y, y 2 ... y„], 
where N = 2WT, y.^Xi + n^ and p(y i j x,) is given by (7-1-12). The average 
mutual information between x* and y N for the AWGN channel is 


I(X N \ Yyv) = f ■ ■ - f f ■ ■ • fp( y N | *at )p(x N ) log P ~ I dy„, 

J *» J J y» J p( y.v) 

= E [ f P()'< I x,)p(x,) log ~^~ d yi dx, 

/=1 J » J -OO P(.Vi) 

where 


P(.Vi |Xj) = 


— t— r-iy.-xY'No 


(7-1-24) 


(7-1-25) 
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The maximum of I(X\ Y ) over the input pdfs p(x,) is obtained when the {*,} 
are statistically independent zero-mean gaussian random variables, i.e., 

(74 - 26) 

where a\ is^the variance of each Then, it follows from (7-1-24) that 


max/(X„;Y JV ) = £|log(l+^ 
pw ,=i ' A / 0 

-SNlogl 1 ^) 

= WT log (l+^) (7-1-27) 

' iVo / 

Suppose that we put a constraint on the average power in jc(r). That is, 

^av = ~ f E[x 2 {t)\ dt 
l J o 

1 JX 

1 1=1 


Hence, 


Nal 

T 


<rl~ 


(7-1-28) 


TP 

1 1 av 

yv 

p 

1 av 

2\V 


(7-1-29) 


Substitution of this result into (7-1-27) for a 2 x yields 

max/(X^;Y A/ )= WTIog(l + —-) (7-1-30) 

p(*) \ WN 0 ' 

Finally, the channel capacity per unit time is obtained by dividing the result in 
(7-1-30) by T. Thus 

c=w,og { l + m) < 7 ' 1 ; 31 ) 

This is the basic formula for the capacity of the band-limited AWGN 
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FIGURE 7-1-7 


FIGURE 7-1-8 


Normalized channel capacity as a function of SNR for band-limited 
AWGN channel. 



lO]og(f>, v /HW (1 ) 


waveform channel with a band-limited and average power-limited input. It was 
originally derived by Shannon (1948b). 

A plot of the capacity in bits/s normalized by the bandwidth W is plotted in 
Fig. 7-1-7 as a function of the ratio of signal power P av to noise power WNq- 
Note that the capacity increases monotonically with increasing SNR. Thus, for 
a fixed bandwidth, the capacity of the waveform channel increases with an 
increase in the transmitted signal power. On the other hand, if /*„ is fixed, the 
capacity can be increased by increasing the bandwidth W. Figure 7-1-8 
illustrates a graph of C versus W. Note that as W approaches infinity, the 
capacity of the channel approaches the asymptotic value 

c “‘^ loSl "’^5 bi,s/s (7 -'- 32) 

It is instructive to express the normalized channel capacity C/W as a 
function of the SNR per bit. Since P av represents the average transmitted 
power and C is the ratio in bits/s, it follows that 

/>av = C% h (7-1-33) 

where € b is the energy per bit. Hence, (7-1-31) may be expressed as 



Channel capacity as a lunction of bandwidth with a fixed 
transmitted average power. 



»V(Hzl 
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Consequently, 


% h 2 C!W - 1 
Ao ~ C/W 


When C/W = 1, %/JV 0 = 1 (0 dB). As C/W -» *, 


2 C/VV 
/V 0 ~ C/W 

-exp(£ta2-l«£) 


( 7 - 1 - 35 ) 


( 7 - 1 - 36 ) 


Thus, g b /N 0 increases exponentially as C/W-* oc. On the other hand, as 
C/W-*0, 


N n 


lim 

c/w-^o 


2 c/tv — 1 
C/W ’ 


= In 2 


( 7 - 1 - 37 ) 


which is -1.6 dB. A plot of C/W versus ? fc //V 0 is shown in Fig. 5-2-17. 

Thus, we have derived the channel capacities of three important channel 
models that are considered in this book. The first is the discrete-input, 
discrete-output channel, of which the BSC is a special case. The second is a 
discrete-input, continuous-output memoryless additive white gaussian noise 
channel. From these two channel models, we can obtain benchmarks for the 
coded performance with hard- and soft-decision decoding in digital com- 
munications systems. 

The third channel model focuses on the capacity in bits/s of a waveform 
channel. In this case, we assumed that we have a bandwidth limitation on the 
channel, an additive gaussian noise that corrupts the signal, and an average 
power constraint at the transmitter. Under these conditions, we derived the 
result given in (7-1-31). 

The major significance of the channel capacity formulas given above is that 
they serve as upper limits on the transmission rate for reliable communication 
over a noisy channel. The fundamental rate that the channel capacity plays is 
given by the noisy channel coding theorem due to Shannon (1948a). 


Noisy Channel Coding Theorem 

There exist channel codes (and decoders) that make it possible to achieve 
reliable communication, with as small an error probability as desired, if the 
transmission rate R <C, where C is the channel capacity. If R > C, it is not 
possible to make the probability of error tend toward zero with any code. 


In the following section, we explore the benefits of coding for the additive 
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noise channel models described above, and use the channel capacity as the 
benchmark for accessing code performance. 


7-1-3 Achieving Channel Capacity with Orthogonal Signals 

In Section 5-2, we used a simple union bound to show that, for orthogonal 
signals, the probability of error can be made as small as desired by increasing 
the number M of waveforms, provided that % h /N„> 2 In 2. We indicated that 
the simple union bound does not produce the smallest lower bound on the 
SNR per bit. The problem is that the upper bound used on Q(x) is very loose 
for small x. 

An alternative approach is to use two different upper bounds for Q(x), 
depending on the value of x. Beginning with (5-2-21), we observe that 

1 - [l -(?(y)]- w ' l *s(M- \)Q(y)<Me vV2 (7-1-38) 

This is just the union bound, which is light when y is large, i.e., for y>v 0 , 
where y {] depends on M. When y is small, the union bound exceeds unity for 
large M. Since 


1 - [1 - <2(.V)] W 1 (7-1-39) 

for all y, we may use this bound for y <y u because it is tighter than the union 
bound. Thus (5-2-21) may be upper-bounded as 


D 

M V2^ 


i: 


(>' Vlyl'-n 


dv + 


M 

V2^r. 


V2y}-''2 


dv (7-1-40) 


The value of y„ that minmizes this upper bound is found by differentiating 
the right-hand side of (7-1-40) and setting the derivative equal to zero, It is 
easily verified that the solution is 


e vi,2 = M (7-1-41) 

or, equivalently, 

y 0 = V2 In M = V2 In 2 log 2 M 
= V2 k In 2 (7-1-42) 


Having determined y 0 , let us now compute simple exponential upper bounds 
for the integrals in (7-1-40). For the first integral, we have 


1 f>« 

V^rJ ... 


e 


iv 


VZy > 2 /2 




hU/vT 

e 


dx 


= 0(V2y 

<e ( V2y y„p/2 


y 0 «5 V2 y 

yu^ V2y 


(7-1-43) 
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The second integral is upper-bounded as follows: 


M 


f 

* Wi 


Vr ^' 2 dy = 


M 

V2t r 


il 


dx 


< 


>V, - V y/2 

Me-* 1 

Me -yn e -(y"-vynv (-^V^) U ' 


Combining the bounds for the two integrals and substituting e'^ 2 for M, we 
obtain 


r e -^-y t d , a + e iyl- r V2 (0«f o « Vfy) 

p M < ^-,v ^_ yo ,: /2 + e{ yl- y) a e - ( ^^ ( VT y^y^V^y) (M ' 45) 

In the range 0^y„ss VTy, the bound may be expressed as 
P M < e< y « - Y)/2 (l +e“ ( v "- v > 72) ') 

< 2e^'- yn , 0«*, « (7-1-46) 

In the range vTy « y l} =s V2y, the two terms in (7-1-45) are identical. Hence, 
P M < 2e~^- y '' )212 , Vf^ >b ssV2^ (7-1-47) 

Now we substitute for y 0 and y. Since y 0 = VzinM = V2 k In 2 and y = ky h , 
the bounds in (7-1-46) and (7-1-47) may be expresed as 


J 2e -*(T!.-2tn 2) /2 (l nMsS l y ) 

" < \ 2e -*< v ^- vn ' 2 > 2 ( J - y «£ lnM * y ) 


(7-1-48) 


The first upper bound coincides with the union bound presented earlier, but it 
is loose for large values of M. The second upper bound is better for large 
values of M. We note that P M — » 0 as k — * (j M — oo) provided that y h > In 2. 

But, In 2 is the limiting value of the SNR per bit required for reliable 
transmission when signaling at a rate equal to the capacity of the infinite- 
bandwidth AWGN channel as shown in Section 7-1-2. In fact, when the 
substitutions 

y 0 - ^2k In 2 = V2/?T In 2 




TP„ 

A*> 


= 7C* In 2 


(7-1-49) 


are made into the two upper bounds given in (7-1-46) and (7-1-47), where 
C„ = P a J (N 0 In 2) is the capacity of the infinite-bandwidth AWGN channel, the 
result is 


’2 . 2~7'<iCV-R) 

2 . 2-^'< v T7- v R) 2 


(O^R^iC^) 

a 


(7-1-50) 
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Thus we have expressed the bounds in terms of C„ and the bit rate in the 
channel. The first upper bound is appropriate for rates below {C a , while the 
second is tighter than the first for rates between and C„. Clearly, the 
probability of error can be made arbitrarily small by making T-**> (M-*& 
for fixed R), provided that R < C,*, = P av {(N 0 In 2). Furthermore, we observe 
that the set of orthogonal waveforms achieves the channel capacity bound as 
M-* oo, when the rate R < C„. 


7-1-4 Channel Reliability Functions 

The exponential bounds on the error probability for M- ary orthogonal 
signals on an infinite-bandwidth AWGN channel given by (7-1-50) may be 
expressed as 

P M <2 ■ 2~ TE(R) (7-1-51) 


The exponential factor 

l(VcI-Vfl ) 2 (jc*<fl*sc,) 


(7-1-52) 


in (7-1-51) is called the channel reliability function for the infinite-bandwidth 
AWGN channel. A plot of E(R)/C X is shown in Fig. 7-1-9. Also shown is the 
exponential factor for the union bound on P M , given by (5-2-27), which may be 
expressed as 

P u ^h-2~ T{ ^ C '- R \ 0 *sR*sjC* (7-1-53) 

Clearly, the exponential factor in (7-1-53) is not as tight as E(R), due to the 
looseness of the union bound. 

The bound given by (7-1-51) and (7-1-52) has been shown by Gallager 
(1965) to be exponentially tight. This means that there does not exist another 
reliability function, say £,(/?), satisfying the condition E l (R)> E(R) for any 
R. Consequently, the error probability is bounded from above and below as 

K l 2- TElK) ^P r *zK u 2- TE{R > ( 7 - 1 - 54 ) 


FIGURE 7-1-9 


Channel reliability function for the infinite-bandwidth AWGN 
channel. 
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where the constants have only a weak dependence on T, i.e., they vary slowly 
with T. 

Since orthogonal signals provide essentially the same performance as the 
optimum simplex signals for large M, the lower bound in (7-1-54) applies for 
any signal set. Hence, the reliability function E(R) given by (7-1-52) 
determines the exponential characteristics of the error probability for digital 
signaling over the infinite-bandwidth AWGN channel. 

Although the error probability can be made arbitrarily small by increasing 
the number of either orthogonal, biorthogonal, or simplex signals, with 
R < C*. for a relatively modest number of signals, there is a large gap between 
the actual performance and the best achievable performance given by the 
channel capacity formula. For example, from Fig. 5-2-17, we observe that a set 
of M = 16 orthogonal signals detected coherently requires a SNR per bit of 
approximately 7.5 dB to achieve a bit error rate of P e = 1CT 5 . In contrast, the 
channel capacity formula indicates that for a C/W = 0.5, reliable transmission 
is possible with a SNR of -0.8 dB. This represents a rather large difference of 
8.3 dB/bit and serves as a motivation for searching for more efficient signaling 
waveforms. In this chapter and in Chapter 8, we demonstrate that coded 
waveforms can reduce this gap considerably. 

Similar gaps in performance also exist in the bandwidth-limited region of 
Fig. 5-2-17, where R/W > 1. In this region, however, we must be more clever 
in how we use coding to improve' performance, because we cannot expand the 
bandwidth as in the power-limited region. The use of coding techniques for 
bandwidth-efficient communication is also treated in Chapter 8. 


7-2 RANDOM SELECTION OF CODES 

The design of coded modulation for efficient transmission of information may 
be divided into two basic approaches. One is the algebraic approach, which is 
primarily concerned with the design of coding and decoding techniques for 
specific classes of codes, such as cyclic block codes and convolutional codes. 
The second is the probabilistic approach, which is concerned with the analysis 
of the performance* of a general class of coded signals. This approach yields 
bounds on the probability of error that can be attained for communication over 
a channel having some specified characteristic. 

In this section, we adopt the probabilistic approach to coded modulation. 
The algebraic approach, based on block codes and on convolutional codes, is 
treated in Chapter 8. 


7-2-1 Random Coding Based on M-ary Binary Coded Signals 

Let us consider a set of M coded signal waveforms constructed from a set of 
n-dimensional binary code words of the form 

C, - [c,i c a . . . c,„], i = l,2 A/ (7-2-1 ) 
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where c,, = 0 or 1. Each bit in the code word is mapped into a binary PSK 
waveform, so that the signal waveform corresponding to the code word C, may 
be expressed as 


where 


s,(0 = S s 0 f{t). 


7=1 


1 = 1 , 2 , ...,M 


(7-2-2) 


VW C . when c v = 1 
-VW C whenc, y = 0 


(7-2-3) 


and % is the energy per code bit. Thus, the waveforms j,(r) are equivalent to 
the n -dimensional vectors 


s, = [jyj s, 2 . . s^,), i = 1, 2, . . . M (7-2-4) 


which correspond to the vertices of a hypercube in n-dimensional space. 

Now, suppose that the information rate into the encoder is R bits/s and we 
encode blocks of k bits at a time into one of the M waveforms. Hence, k = RT 
and M - 2* = 2 RT signals are required. It is convenient to define a parameter D 
as 


D = — dimensions/s 


(7-2-5) 


Thus, n = DT is the dimensionality of the signal space. 

The hypercube has 2" = 2 or vertices, of which M = 2 RT may be used to 
transmit the information. If we impose the condition that D> R, the fraction 
of the vertices that we use as signal points is 

yk yRT 

F~- = - — = 7 -<0-/07- n , M 

r 2" 2 dt Z (/-2-o) 

Clearly, if D > R, we have F — * 0 as T — > tc. 

The question that we wish to pose is the following. Can we choose a subset 
M = 2 ri vertices out of the 2" = 2 DT available vertices such that the probability 
of error P-»0 as T * or, equivalently, as Since the fraction F of 

vertices used approaches zero as it should be possible to select M 

signal waveforms having a minimum distance that increases as T— and. 
thus, P e ^> 0. 

Instead of attempting to find a single set of M coded waveforms for which 
we compute the error probability, let us consider the ensemble of (2 n ) M distinct 
ways in which we can select M vertices from the 2" available vertices of the 
hypercube. Associated with each of the T M selections, there is a communica- 
tion system, consisting of a modulator, a channel, and a demodulator, that is 
optimum for the selected set of M waveforms. Thus, there are 2" w 
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FIGURE 7-2.1 


nU) 



An ensemble at 2' |W communications system. Each system employs a different set of M signals 
from the jet of 2" M possible choices. 


communication systems, one for each choice of the M coded waveforms, as 
illustrated in Fig. 7-2-1. Each communication system is characterized by its 
probability of error. 

Suppose that our choice of M coded waveforms is based on random 
selection from the set of 2 nM possible sets of codes. Thus, the random selection 
of the mth code, denoted by occurs with probability 

^({S/D = 2~ nM (7-2-7) 

and the corresponding conditional probability of error for this choice of coded 
signals is / 5 e .({s,}„,). Then, the average probability of error over the ensemble of 
codes is 

h = 2 ^({s/DPtfSiD 

m = 1 

= 2 _ " M EW-») (7-2-8) 

m = 1 

where the overbar on P e denotes an average over the ensemble of codes. 

It is clear that some choices of codes will result in large probability of error. 
For example, the code that assigns all M fc-bit sequences to the same vertex of 
the hypercube will result in a large probability of error. In such a case, 
P -(R}m) > Pf However, there will also be choices of codes for which 
^({s,U < F,. Consequently, if we obtain an upper bound on P r , this bound 
will also hold for those codes for which /^({s.L) < P e . Furthermore, if P, -*0 as 
T -*■ « then we conclude that, for these codes, / > ({s,L)-»0 as T~* «. 

In order to determine an upper bound on P r , we consider the transmission 
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of a Ar-bit message X* = [x\X 2 x 3 . . . x k ], where x f = 0 or 1 for = 1, 2, . . . , k. 
The conditional probability of error averaged over the ensemble of codes is 

PjxV) = s T,(X„ {S, }„)/>({*,}„,) (7-2-9) 

ait codes 

where P e (X k , {s,} m ) is the conditional probability of error for a given A -bit 
message X*, which is transmitted by use of the code {s,} m . For the mth code, 
the probability of error P e (X k , {s,}„,) is upper-bounded as 

M 

Pr(X k! {S,D « 2 PzU S/. S*) (7-2-10) 

/= 1 
l*k 

where P 2 m(s,, s*) is the probability of error for a binary communication system 
that employs the signal vectors s, and s*. to communicate one of two equally 
likely A:-bit messages. Hence, 

£(X*)«s 2 P,«s,} m ) £ P 2m (s„ s k ) (7-2-11) 

all codes / = 1 

i*k 

If we interchange the order of the summations in (7-2-11) we obtain 

2 P,({s,UP 2m (S/, ■*) 

t~l ^-alf codes 
l*k 

M 

«SP2(S/,s*) (7-2-12) 

t*k 


where P 2 (s h s k ) represents the ensemble average of P 2tri (s,, s k ) over the T M 
codes or the 2 nM communication systems. 

For the additive white gaussian noise channel, the binary error probability 
Pzmfo, s*) is 


P2m( s l> Sk) ~ Q 



(7-2-13) 


where dj k - |s, - s*| 2 . If s, and s* differ in d coordinates, 


Hence, 


d) k = Is, - s*| 2 = 2 (s,, - s ki ) 2 = d( 2Vg ( .) 2 = 4d% 



(7-2-14) 

(7-2-15) 


Now, we can average P 2 „(s f ,s k ) over the ensemble of codes. Since all 
the codes are equally probable, the signal vector s, is equally likely to be any 
of the 2" possible vertices of the hypercube and it is statistically independent 
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of the signal vector s*. Therefore, P(s,, = s ki ) = { and P{s u i t s ki ) = inde- 
pendently for all / = 1, 2, . . . , n. Consequently, the probability that s, and s A 
differ in d positions is simply 


P(^) = (!)"(") 


(7-2-16) 


Hence, the expected value of P 2m {Si, s*) over the ensemble of codes may be 
expressed as 

(/—i) x V / 


■kt(XVf) 


(7-2-17) 

The result (7-2-17) can be simplified if we upper-bound the Q-function as 

q( 




Thus, 


^2(8/, S *) ^ 2 


-- 2 ( n 

h \d 


-</*, w,, 


=s2 _ "(l + e~*‘ ,N "y 

*s[£(l +e-' jN ")) n (7-2-18) 

We observe that the right-hand side of (7-2-18) is independent of the indices l 
and k. Hence, when we substitute the bound (7-2-18) into (7-2-12), we obtain 

M 


PM * £ P 3 (n„ s*) - (M -1)[J(1 + *-'■'*■)]" 


l -l 
h*k 


<A/[|(1 +e / w ")]" 

Finally, the u nconditional average error probability P e is obtained by 
averaging P e (X k ) over all possible A-bit information sequences. Thus, 


£ = £ PAX k )P{\ k ) < Af[i( 1 + e- '- IN ")\" 2 P(X k ) 

* A 

<A/B(1 + £’~'‘ /% )]" (7-2-19) 

This result can be expressed in a more convenient form by first defining a 
parameter R 0 , which is called the cutoff rate and has units of bits/dimension, as 

/g„-log 2 — 

= 1 - log 2 (1 + e <r,/A/ "), antipodal signaling (7-2-20) 

Then, (7-2-19) becomes 

P e < = 2 R1 2 ~" R " 


(7-2-21) 
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FIGURE 7-2-2 


The cutoff rate R„ as a function of the SNR per dimension 
in decibels. 



Since n = DT, (7-2-21) may be expressed as 

P e <2 T{DRa ~ R) (7-2-22) 

The parameter R 0 is plotted as a function of %/N 0 in Fig. 7-2-2. We observe 
that 0=s/? o « 1. Consequently, P e — »0 as T— * provided that the information 

rate R < DR 0 . 

Alternatively, (7-2-2 L) may be expressed as 

P e < 2~ n(Rn ~ RID ' (7-2-23) 

The ratio R jD also has units of bits/dimension and may be defined as 


= R^ R _RT_k 
c D n/T n ~ n 


(7-2-24) 


Hence, /? c is the code rate and 


p f (7-2-25) 

We conclude that when R c < R 0t the average probability of error P, -* 0 as the 
code block length n -* «. Since the average value of the probability error can 
be made arbitrarily small as n -><*>, it follows that there exist codes in the 
ensemble of 2 nM codes that have a probability of error no larger than P r . 

From the derivation of the average error probability given above, we 
conclude that good codes exist. Although we do not normally select codes at 
random, it is interesting to consider the question of whether or not a randomly 
selected code is likely to be a good code. In fact, we can easily show that there 
are many good codes in the ensemble. First, we note that P e is an ensemble 
average of error probabilities over all codes and that all these probabilities are 
obviously positive quantities. If_a code is selected at random, the probability 
that its error probability P e > aP e is less than l/a. Consequently, no more than 
10% of the codes have an error probability that exceeds 10£ and no more than 
1% of the codes have an error probability that exceeds 100P f . 
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FIGURE 7-2-3 


We should emphasize that codes with error probabilities exceeding P t are 
not necessarily poor codes. For example, suppose that an average error rate of 
can be attained by using codes with dimensionality n 0 when 
R„ > R c . Then, if we select a code with error probability lOOOP, = 10 -7 , we may 
compensate for this reduction in error probability by increasing n from n 0 to 
n = 10n o /7. Thus, by a modest increase in dimensionality, we have a code with 
F r <10 _)O . In summary, good codes are abundant and, hence, they are easily 
found even by random selection. 

It is also interesting to express the average error probability in (7-2-25) in 
terms of the SNR per bit, y b . To accomplish this, we express the energy per 
signal waveform as 

% = n% c = k% b (7-2-26) 

Hence, n -k^ h /%. We also note that R ( ^ b l^ c = 1. Therefore, (7-2-25) may be 
expressed as 

P e < 2 -*<*''*«- '> (7-2-27) 

where y 0 is a normalized SNR parameter, defined as 

Rc 

yo = — y b 

*M) 

_ l-log^l+e-*- 1 *) (7-2-28) 

Now, we note that P e ~*0 as k —> provided that the SNR per bit, y b > y 0 . 

The parameter y 0 is plotted in Fig. 7-2-3 as a function of R c y b - Note that as 
y 0 — >2 In 2. Consequently, the error probability for M-ary binary 
coded signals is equivalent to the error probability obtained from the union 
bound for M-ary orthogonal signals, provided that the signal dimensionality is 
sufficiently large so that -y 0 « 2 In 2. 

The dimensionality parameter D that we introduced in (7-2-5) is propor- 
tional to the channel bandwidth required to transmit the signals. Recall from 
the sampling theorem that a signal of bandwidth W may be represented by 
samples taken at a rate of 2 W samples/s. Thus, in the time interval of length T 


Lower bound on SNR per bit, y b , for binary antipodal 
signals. 
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there are n = 2WT samples or, equivalently, n degrees of freedom (dimen- 
sions). Consequently, D may be equated with 2W. 

Finally, we note that the binary coded signals considered in this section are 
appropriate when the SNR per dimension is small, e.g., % C /N 0 < 10. However, 
when $ r //V 0 > 10, R 0 saturates at 1 bit/dimension. Since the code rate is 
restricted to be less than R 0 , binary coded signals become inefficient at 
%/N 0 > 10. In such a case, we may use nonbinary-coded signals to achieve an 
increase in the number of bits per dimension. For example, multiple-amplitude 
coded signal sets can be constructed from nonbinary codes by mapping each 
code element into one of q possible amplitude levels (as in PAM). Such codes 
are considered below. 

7-2-2 Random Coding Based on M - ary 
Multiamplitude Signals 

Instead of constructing binary-coded signals, suppose we employ nonbinary 
codes with code words of the form given by (7-2-1), where the code elements 
c tj are selected from the set {0, 1, . . . , q — 1}, Each code element is mapped 
into one of q possible amplitude levels. Thus, we construct signals correspond- 
ing to n -dimensional vectors {s,} as in (7-2-4), where the components are 
selected from a multiamplitude set of q possible values. Now, we have q'‘ 
possible signals, from which we select M ~ 2 RT signals to transmit /c-bit blocks 
of information. The q amplitudes corresponding to the code elements 
{0, 1, . . , q — 1} may be denoted by (a,, a 2 , . . . , a^}, and they are assumed to 
be selected according to some specified probabilities {p,}. The amplitude levels 
are assumed to be equally spaced over the interval [~V% C , V¥ c ]. For example. 
Fig. 7-2-4 illustrates the amplitude values for q = 4. In general, adjacent 
amplitude levels are separated by 2 y/%/(q— 1). This assignment guarantees 
not only that each component s tl is peak-energy-limited to \f%, but, also, each 
code word is constrained in average energy to satisfy the condition 

|s,j 2 < n% (7-2-29) 

By repeating the derivation given above for random selection of codes in an 
AWGN channel, we find that the average probability of error is upper- 
bounded as 

p t < M2 nR " = 2* r 2'" r<l = 2~ n(R "~ RlD} 
where R 0 is defined as 

«o = -log,(l 1 p,p m e 

m = l > 


(7-2-30) 
(7-2-31 ) 


-1% -\& t 



JT. 


FIGURE 7-2-4 Signal alphabet consisting of four amplitude levels. 
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and 

d tm = \a, - a m \, l, m = 1, 2, . . . , q (7-2-32) 

In the special case where all the amplitude levels are equally likely. 
Pi = Pm ~ \l<1 and (7-2-31) reduces to 

/?„ = -log 2 (AS 2 e- d * ,4N «) (7-2-33) 

'<7 1 = 1 m=l ' 

For example, where q = 2 and a , = -VW C , a 2 = V%, we have f/n=d 22 = 0, 
^12 = d 2t = 2V%, and, hence, 

o , 2 

= lo & j" - 9=2 

which agrees with our previous result. When q - 4, a, ~-V%, a 2 = -VWj3, 
a 3 = V^/3, and a 4 = V%, we have = 0 for m = 1 , 2 , 3 , 4 , r / l2 = d 23 = d 34 = 
r / 21 = d 32 = d 43 = 2V%/3, d l3 = </ 31 = d 24 = d 42 = 4VW c /3, and d l4 = rf 41 = 2VW C . 
Hence, 

g 

/?<l ~ log2 ■£+ 3e ' + 2e + e ~ ' 9 = 4 (7-2-34) 

Clearly, /?„ now saturates at 2 bits/dimension as & C /At 0 increases. 

The graphs of R {i as a function of %/N 0 for equally spaced and equally likely 
amplitude levels are shown in Fig. 7-2-5 for q = 2, 3, 4, 8, 16, 32, and 64. Note 
that the saturation level now occurs at log 2 q bits/dimension. Consequently, for 
high SNR, P, -»0 as n -> * provided that R < DR 0 = 2W7? 0 bits/s. 

If we remove the peak energy constraint on each of the elements, but retain 


FIGURE 7-2-5 


Cutoff rate /?„ for equally spaced ^-level amplitude 
modulation with equal probabilities p,. = \/q [ From Principles 
of Communication Engineering, by J. M. Wozencraft and 
!. M. Jacobs ,© 1965 by John Wiley and Sons, Inc. Reprinted 
with permission of the publisher.] 



Energy ratio per dimension. 

lOlog^/W,) 
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the average energy constraint per code word as given by (7-2-29) il is possible 
to obtain a larger upper bound on the number of bits per dimension. For this 
case, the result obtained by Shannon (1959b) is 



(7-2-35) 


The graph of R* as a function of the SNR per dimension, % C /N n , is also shown 
in Fig. 7-2-5. It is dear that our selection of the equally spaced, equally likely 
amplitude levels that result in R 0 is suboptimum. However, these coded signals 
are easily generated and implemented in practice. This is an important 
advantage that justifies their use. 


7-2-3 Comparison of R* with the Capacity of the 
AWGN Channel 

The channel capacity of the band-limited additive white gaussian noise channel 
with an average power constraint on the input signal was derived in Section 
7-1-2, and is given by 

C = W log 2 (l + bits/s (7-2-36) 

where P av is the average power of the input signal and W is the channel 
bandwidth. It is interesting to express the capacity of this channel in terms of 
bits/dimension and the average power in terms of energy/dimension. With 
D = 2 W and 

n n 

we have 

f > av = ~.%~ D%. (7-2-37) 


By defining C„ = C/2W = C/D and substituting for W and P av , (7-2-36) may be 
expressed as 


C„ = 5 log 2 (l 


+- 2 


' AL 


- 2 log, (1 + 2R c y h ) bits/dimension 


(7-2-38) 
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FIGURE 7-2-6 Comparison of cutoff rate R* with the channel capacity for 
an AWGN channel. 



This expression for the normalized capacity may be compared with Rf u as 
shown in Fig. 7-2-6. Since C„ is the ultimate upper limit on the transmission 
rate RID, R% < C„ as expected. We also observe that for small values of % 
the difference between R* and C n is approximately 3 dB. Therefore, the use of 
randomly selected, optimum average power-limited, multiamplitude signals 
yields a rate function R* that is within 3 dB of the channel capacity. More 
elaborate bounding techniques are required , to show that the probability of 
error can be made arbitrarily small when R < DC„ = 2WC„ = C. 


7-3 COMMUNICATION SYSTEM DESIGN BASED 
ON THE CUTOFF RATE 

In the foregoing discussion, we characterized coding and modulation per- 
formance in terms of the error probability, which is certainly a meaningful 
criterion for system design. However, in many cases, the computation of the 
error probability is extremely difficult, especially if nonlinear operations such 
as signal quantization are performed in processing the signal at the receiver, or 
if the additive noise is nongaussian. 

Instead of attempting to compute the exact probability of error for specific 
codes, we may use the ensemble average probability of error for randomly 
selected code words. The channel is assumed to have q input symbols 
{0, 1, . . . , q - 1} and Q output symbols {0,1 Q - 1}, and to be charac- 
terized by the transition probabilities P(i j /), where / =0, 1 q-\ and 

/ = 0, 1-, . . . , Q - \, with Q^q. The input symbols occur with probabilities {p,} 
and are assumed to be statistically independent. In addition, the noise on the 
channel is assumed to be statistically independent in time, so that there is no 
dependence among successive received symbols. Under these conditions, the 
ensemble average probability of error for random selected code words may be 
derived by applying the Chernoff bound (see Viterbi and Omura, 1979). 

The general result that is obtained for the discrete memoryless channel is 

P e < 2 " {R u~ RID) (7-3-1) 

where n is the block length of the code, R is the information rate in bits/s, D is 
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FIGURE 7-3-1 


Example of quantization of the 
demodulator output into five levels. 


mu) 



the number of dimensions per second, and R Q is the cutoff rate for a quantizer 
with Q levels, defined as 


R 0 — mas 


max j— log 2 j P, Vp (' I;')] } (7-3-2) 

t Pi) L ;=o L y=o J > 


From the viewpoint of code design, the combination of modulator, 
waveform channel, and demodulator constitutes a discrete-time channel with q 
inputs and Q outputs. The transition probabilities {P(i | /)} depend on the 
channel noise characteristics, the number of quantization levels, and the type 
of quantizer, e.g., uniform or nonuniform. For example, in the binary-input 
AWGN channel, the output of the correlator at the sampling instant may be 
expressed as 


p{y\i)-^^ e ~ iv - mi)2aa \ > = o,i 


(7-3-3) 


where m 0 = -V% c , m, = V¥ c , and a 2 = £jV 0 . These two pdfs are shown in Fig. 
7-3-1.. Also illustrated in the figure is a quantization scheme that subdivides 
the real line into five regions. From such a subdivision, we may compute the 
transition probabilities and optimally select the thresholds that subdivide the 
regions in a way that maximizes R Q for any given Q. Thus, 


l»= ( P(y | /) dy (7-3-4) 

where the integral of p(y \ j) is evaluated over the region r, that corresponds to 
the transition probability P(i | j). 

The value of the rate R Q in the limit as £?-»*> yields the cutoff rate for the 
unquantized decoder. It is relatively straightforward to show that as Q -* °c, 
the first summation (sum from / =0 to Q — 1) in (7-3-2) becomes an integral 
and the transition probabilities are replaced by the corresponding pdfs. Thus, 
when the channel consists of q discrete inputs and one continuous output y, 
which represents the unquantized output from a matched filter or a cross- 
correlator in a system that employs either PSK or a multiamplitude (PAM) 
modulation, the cutoff rate is given by 


R 0 — max 

i p,) 


{ - log 2 _Q dy [ ^ p, v£( 7 [ 7 ) . 


(7-3-5) 


where Pj , O^j^q- 1, is the probability of transmitting the jth symbol and 
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P(y | j) is the conditional probability density function of the output y from the 
matched filter or cross-correlator when the yth signal is transmitted. This is the 
desired expression for unquantized (soft-decision) decoding. 

We observe that when the input signal is binary PSK with po = p, = 2 and 
the noise is additive, white, and gaussian, (7-3-5) feduces to the familiar result 
given previously in (7-2-20). 

The general expressions in (7-3-5) and (7-3-2) allow us to compare the 
performance of various receiver implementations based on a different number 
of quantization levels. 


Example 7-3-1 

Let us compare the performance of a binary PSK input signal in an AWGN 
channel when the receiver quantizes the output to Q = 2, 4, and 8 levels. To 
simplify the optimization problem for the quantization of the signal at the 
output of the demodulator, the quantization levels are placed at 0, ±x h , 
±2t h , . . . , ±(2 6-1 - 1)t a , where x h is the quantizer step-size parameter, 
which is to be selected, and b is the number of bits of the quantizer. A good 
strategy for the selection of x h is to choose it to minimize the SNR per bit 
y b that is required for operation at a code rate R 0 . This implies that the 
step-size parameter must be optimized for every SNR, which in a practical 
implementation of the receiver means that the SNR must be measured. 
Fortunately, x h does not exhibit high sensitivity to small changes in SNR, so 
that it is possible to optimize x h for one SNR and obtain good performance 
for a wide range of SNRs about this nominal value by using a fixed x h . 

Based on this approach, the expression for R Q given by (7-3-2) was 
evaluated for b = 1 (hard-decision decoding), 2, and 3 bits, corresponding to 
Q~ 2, 4, and 8 levels of quantization. The results are plotted in Fig. 7-3-2. 
The value of R 0 for unquantized soft-decision decoding, obtained by 
evaluating (7-3-5) is also shown in Fig. 7-3-2. We observe that two-bit 
quantization with x h = 1.0 gains about 1.4 dB over hard-decision decoding, 
and three-bit quantization with x h = 0.5 yields an additional 0.4 dB improve- 
ment. Thus, with a three-bit quantizer, we are within 0.2 dB of the 


FIGURE 7-3-2 Effect of quantization on the performance of a coded 
communications system operating at a rate R = R 0 or 
R = R q , with binary PSK modulation on an AWGN 
channel. 
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unquantized soft-decision decoding limit. Clearly, there is little to be gained 
by increasing the precision any further. 


When a nonbinary code is used in conjunction with M - ary (M - q ) 
signaling, the received signal at the output of the M matched filters may be 
represented by the vector y = [ y, y 2 ■ • ■ y M \ The cutoff rate for this M-input, 
M-output (unquantized) channel is 


{ M-l M-l /•= 

_1o 82 2 2 PiPi\ v'piy i)p(y | 0 (iy (7-3-6) 

where p(y\j) is the conditional probability density function of the output 
vector y from the demodulator given that the ;th signal was transmitted. Note 
that (7-3-6) is similar in form to (7-3-5) except that we now have an M-fold 
integral to perform because there are M outputs from the demodulator. 

Let us assume that the M signals are orthogonal so that the M outputs 
conditioned on a particular input signal are statistically independent. As a 
consequence, 

M- 1 

p(y\j) =Ps + „(y l ) I! Pn(y.) (7-3-7) 

i=0 

where p J+ „(>y) is the pdf of the matched filter output corresponding to the 
transmitted signal and {p„(y,)} corresponds to the noise-ortly outputs from the 
other M — 1 matched filters. When (7-3-7) is incorporated into (7-3-6) we 
obtain 


{ rV-l M- 1 M- ! , r* 2-i 

-l°g 2 [ 2 p} + 2 2 p,p { J ^ dyVp s+n (y)p n (y)) 


(7-3-8) 


The maximization of R 0 over the set of input probabilities yields p, = l/M for 
1 « / « M. Consequently, (7-3-8) reduces to 


M 


R ° log2 { l + (M - i )[TU V Ps+ „(y)Pn(y)dy } 2 

- log 2 M - log 2 |l + [M — 1)£J V p, +n (y)p„(y)dy f (7-3-9) 


This is the desired result for the cutoff rate of an A/-ary input, A/-ary vector 
output unquantized channel. 

For phase coherent detection of the A/-ary orthogonal signals the appropri- 
ate pdfs are 


Ps +n (y) 


— — = a -(*- m) 2 !2<r 2 

V2ff<r 


1 

\flncT € 


-y 2 n<T } 


Pn(y) = 


(7-3-10) 
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FIGURE 7-5-3 


where m = Vg and a 2 — Substituting these relations into (7-3-9) and 
evaluating the integral yields 

„ , r M 

0 g2 Ll 

= l ° g2 [l + (M (7 ' 341) 

where £ is the received energy per waveform, R w is the information rate in 
bits/waveform, and y b = & 6 /;V 0 is the SNR per bit. 

We should emphasize that the rate parameter R„ has imbedded in it the 
code rate R c . For example, if M = 2 and the code is binary then R w — R c . More 
generally, if the code is binary and M - 2 V then each M-ary waveform conveys 
R w — vR c bits of information. It is also interesting to note that if the code is 
binary and M = 2 then (7-3-11) reduces to 

Ro ~ log 2 pXnz)’ W = 2 orthogonal signals (7-3-12) 

which is 3 dB worse than the cutoff rate for antipodal signals. If we set R w = R 0 
in (7-3-11) and solve for y h , we obtain 

2 / Af — 1 \ 

Graphs of R 0 versus y b for several values of M are illustrated in Fig. 7-3-3. 
Note that the curve for any value of M saturates at /?„ = log 2 M. 

It is also interesting to consider the limiting form of (7-3-11) in the limit as 
M —* ®. This yields 

g 

lim R {) - — — — bits/waveform (7-3-14) 

2A' 0 ln2 


SNR per bit required to operate at a rate R () with M-ary 
orthogonal signals detected coherently in an AWGN 
channel. 
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Since ? = P m T, where T is the time interval per waveform, it follows that 


hrn — — 


M- 


T 2 No In 2 


be. 


(7-3-15) 


Hence, in the limit as M — » the cutoff rate is one-half of the capacity for the 

infinite bandwidth AWGN channel. Alternatively, the substitution of % - R 0 %t, 
into (7-3-14) yields y h = 2 In 2 (1.4 dB), which is the minimum SNR required to 
operate at R () (as M — * »). Hence, signaling at a rate R 0 requires 3 dB more 
power than the Shannon limit. 

The value of R 0 given in (7-3-11) is based on the use of A/-ary orthogonal 
signals, which are clearly suboptimal when M is small. If we attempt to 
maximize R 0 by selecting the best set of M waveforms, we should not be 
surprised to find that the simplex set of waveforms is optimum. In fact, /?„ for 
these optimum waveforms is simply given as 


R« = log 2 


M 

A + (M 


(7-3-16) 


If we compare this expression with (7-3-11) we observe that R 0 in (7-3-16) 
simply reflects the fact that the simplex set is more energy-efficient by a factor 
M/(M- 1 ). 

In the case of noncoherent detection, the probability density functions 
corresponding to signal-plus-noise and noise alone may be expressed as 

P s+ „(y) = ye ^ + “ 2) %(ay), y^O 
Pn(y) = ye' y2/2 , y2z o ( 7317 ) 

>vb ;re, by definition, a = \ f 2 r tlN 0 . The computation of R 0 given by (7-3-9) does 
not yield a closed-form solution. Instead, the integral in (7-3-9) must be 
evaluated numerically. Results for this case have been given by Jordan (1966) 
and Bucher (1980). For example, the (normalized) cutoff rate R t) for M-ary 
orthogonal signals with noncoherent detection is shown in Fig. 7-3-4 for 


FIGURE 7-3-4 SNR per bit required tc operate at a rate /?,, with 
M - ary orthogonal signals detected noncoherentlv 
in an AWGN channel. 
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M = 2, 4, 8, and 16. For purposes of comparison we also plot the cutoff rate for 
hard-decision decoding ( Q = M ) of the Af-ary symbols. In this case, we have 

<7 - 3 - I8> 

where P M is the probability of a symbol error. For a relatively broad range of 
rates, the difference between soft- and hard-decision decoding is approximately 
2dB. 

The most striking characteristic of the performance curves in Fig. 7-3-4 is 
that there is an optimum code rate for any given Af. Unlike the case of 
coherent detection, where the SNR per bit decreases monotonically with a 
decrease in code rate, the SNR per bit for noncoherent detection reaches a 
minimum in the vicinity of a normalized rate of 0.5, and increases for both high 
and low rates. The minimum is rather broad, so there is really a range of rates 
from 0.2 to 0.9 where the SNR per bit is within 1 dB of the minimum. This 
characteristic behavior in the performance with noncoherent detection is 
attributed to the nonlinear characteristic of the detector. 


7-4 BIBLIOGRAPHICAL NOTES AND REFERENCES 

The pioneering work on channel characterization in terms of channel capacity 
and random coding was done by Shannon (1948a, b, 1949). Additional 
contributions were subsequently made by Gilbert (1952), Elias (1955), Galla- 
ger (1965), Wyner (1965), Shannon et al. (1967), Forney (1968) and Viterbi 
(1969). All of these early publications are contained in the IEEE Press book 
entitled Key Papers in the Development of Information Theory, edited by 
Slepian (1974). 

The use of the cutoff rate parameter as a design criterion was proposed and 
developed by Wozencraft and Kennedy (1966) and by Wozencraft and Jacobs 
(1965). It was used by Jordan (1966) in the design of coded waveforms for 
Af-ary orthogonal signals with coherent and noncoherent detection. Following 
these pioneering works, the cutoff rate has been widely used as a design 
criterion for coded signals in a variety of different channel conditions. 


PROBLEMS 


7-1 Show that the following two relations are necessary and sufficient conditions for 
the set of input probabilities {/»(*,)} to maximize I(X; Y) and. thus, to achieve 
capacity for a DMC: 


I{ x,\ Y) = C for all j with P(x,) > 0 
I( x,; Y) C for ail with P(x t ) — 0 
where C is the capacity of the channel and 


/(*,; Y)= £ P(y, | xf) log 


f-0 


F(y,k) 

P{y.) 
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7-2 Figure P7-2 illustrates an A/-ary symmetric DMC with transition probabilities 
P(v !»')= I — p when x — y = k for k = 0, 1,... ,M — 1, and P{ y j x) = pl(M — 1) 
when .r t 6 v 

a Show that this channel satisfies the condition given in Problem 7-1 when 
P(x k )=l/M. 

b Determine and plot the channel capacity as a function of p. 

7-3 Determine the capacities of the channels shown in Fig. P7-3. 


FIGURE P7-3 





7-4 Consider the two channels with the transition probabilities as shown in Fig. P7-4 
Determine if equally probable input symbols maximize the information rate 
through the channel. 


0.6 



r, 



FIGURE P7-4 


0.6 

(a) 


0,6 
( b ) 
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FIGURE P7-6 



7-5 A telephone channel has a bandwidth W = 3000 Hz and a signal-to-noise power 
ratio of 400 (26 dB). Suppose we characterize the channel as a band-limited 
AWGN waveform channel with P, s ./WN a - A00. 
a Determine the capacity of the channel in bits/s. 

b Is the capacity of the channel sufficient to support the transmission of a speech 
signal that has been sampled and encoded by means of logarithmic PCM? 
c Usually, channel impairments other than additive noise limit the transmission 
rate over the telephone channel to less than the channel capacity of the 
equivalent band-limited AWGN channel considered in (a). Suppose that a 
transmission rate of 0.7C is achievable in practice without channel encoding. 
Which of the speech source encoding methods described in Section 3-5 provide 
sufficient compression to fit the baridwidth restrictions of the telephone channel? 

7-6 Consider the binary-input, quaternary-output DMC shown in Fig. P7-6. 
a Determine the capacity of the channel, 
b Show that this channel is equivalent to a BSC. 

7-7 Determine the channel capacity for the channel shown in Fig. P7-7. 

7-8 Consider a BSC with crossover probability of error p. Suppose that R is the 
number of bits in a source code word that represents one of 2 R possible levels at 
the output of a quantizer. Determine 

a the probability that a code word transmitted over the BSC is received correctly; 
b the probability of having at least one bit error in a code word transmitted over 
the BSC; 

c the probability of having n e or less bit errors in a code word; 



! ~p 


FIGURE P7-7 
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FIGURE P7-10 


Input Output 

X 1 -p T 



d Evaluate the probability in (a), (b), and (c) for R = 5, p ~ 0.01, and n c = 5. 

7-9 Show that, for a DMC, the average mutual information between a sequence 
X\X 2 ■ - ■ X„ of channel inputs and the corresponding channel outputs satisfy the 
condition 

n 

l(X,X 2 y„ y 2 , , y„) « 2 /(*.: Y.) 

1 = 1 


with equality if and only if the set of input symbols is statistically independent. 

7-10 Figure P7-10 illustrates a binary erasure channel with transition probabilities 
f*(0 | 0) = P(1 j 1) = 1 — p and P(e [ 0) — P(e | 1 ) = p. The probabilities for the 
input symbols are P(X = 0) = a and P( X = 1 ) = 1 - a. 
a Determine the average mutual information I{X\ T) in bits, 
b Determine the value of « that maximizes l(X\ Y), i.e., the channel capacitv C in 
bits/channel use, and plot C as a function of p for the optimum value of a 
c For the value of a found in (b), determine the mutual information I[x\v) = 
1(0; 0), /( 1 ; 1), / (0; e), and 7(1; e). 

7-11 Consider the binary-input, ternary-output channel with transition probabilities 
shown in Fig. P7-11, where e denotes an erasure. For the AWGN channel, a and p 
are defined as 


1 


p = 


VjtIVo 

1 

Yk N„ 



2, N{>Ax 

dx 


a Determine R Q for Q = 3 as a function of the probabilities a and p. 
b The rate parameter R Q depends on the choice of the threshold J3 through the 
probabilities a and p. For any %JN 0 , the value of /3 that maximizes R a can be 
determined by trial and error. For example, it can be shown that for jg/.V,, 
below OdB, /3 opl = 0.65 for 1 *£ W ( /Af n *sl0, (3 apl varies approximately 


I -p- U 



FIGURE P7-11 


I - p - a 


I 
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FIGURE P7-13 


FIGURE P7-15 




linearly between 0.65V]7^, and I.OVTaL By using (3 = 0.65Vpv{, for the entire 
range of %/N„, plot R Q versus %!N„ and compare this result with R v (Q = x ). 

7-12 Find the capacity of the cascade connection of n binary-symmetric channels with 
the same crossover probability e. What is the capacity when the number of 
channels goes to infinity? 

7-13 Channels 1, 2, and 3 are shown in Fig. P7-13. 

a Find the capacity of channel 1 . What input distribution achieves capacity? 
b Find the capacity of channel 2. What input distribution achieves capacity? 
c Let C denote the capacity of the third channel and C, and C? represent the 
capacities of the first and second channel. Which of the following relations holds 
true and why? 

C<i(C,+C,) (i) 

C = HC, + C\) (ii ) 

C > 2-(C', + C) (iii) 

7-14 Let C denote the capacity of a discrete memorvless channel with input alphabet 

f = {a,, x 2 , . ■ ■ , JCftr} and output alphabet = { y, , y : y M }. Show that C *£ 

min {log M, log N}. 

7-15 The channel C (known as the Z channel) is shown in Fig. P7-15. 
a Find the input probability distribution that achieves capacity, 
b What is the input distribution and capacity for the special cases e = 0, e = 1, and 
£ — 0.5? 

c Show that if n such channels are cascaded, the resulting channel will be 
equivalent to a Z channel with e, = e". 
d What is the capacity of the equivalent Z channel when c. 

7-16 Find the capacity of an additive white Gaussian noise channel with a bandwidth 
1 MFlz, power 10 W, and noise power spectral density [N,, = 10 4 W/Hz. 

7-17 Channel C, is an additive white gaussian noise channel with a bandwidth W, 
average transmitter power P , and noise power spectral density {/V„. Channel C 2 is 


o 


i 



o 
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an additive gaussian noise channel with the same bandwidth and power as channel 
C, but with noise power spectral density <!>,(/). It is further assumed that the total 
noise power for both channels is the same; that is, 


f <t> r (f)df= f \Kdf = N H W 

' - W J W 


Which channel do you think has a larger capacity? Give an intuitive reasoning. 
7-18 A discrete-time memoryless gaussian source with mean 0 and variance a 2 is to be 
transmitted over a binary-symmetric channel with crossover probability p. 
a What is the minimum value of the distortion attainable at the destination 
(distortion is measured in mean-squared error)? 
b If the channel is a discrete-time memoryless additive gaussian noise channel 
with input power P and noise power P„, what is the minimum attainable 
distortion? 

c Now assume that the source has the same basic properties but is not 
memoryless. Do you expect the distortion in transmission over the binarv- 
symmetric channel to be decreased or increased? Why? 

7-19 X is a binary memoryless source with p{X ~0)=0.3. This source is transmitted 
over a binary-symmetric channel with crossover probability p = 0. 1 
a Assume that the source is directly connected to the channel, i.e., no coding is 
employed. What is the error probability at the destination? 
b If coding is allowed, what is the minimum possible error probability in the 
reconstruction of the source 

c For what values of p is reliable transmission possible (with coding, of course)'’ 
7-20 Plot the capacity of an AWGN channel that employs binary antipodal signaling, 
with optimal bit-by-bit detection at the receiver, as a function of N„. On the 
same axis, plot the capacity of the same channel when binary orthogonal signaling 
is employed. 

7-21 In a coded communication system , M messages 1 , 2, . . . , M = 2* are transmitted 

by M baseband signals *,(/). x 2 (/) each of duration nT. The general 

form of .r,(f) is given by 

xM='l,f,U-jT) 

where f,(t) can be either of the two signals /,(/) or / 2 (r), where f,(t) = / 3 (r)=0 for 
all t t [0, T]. We further assume that /,(t) and Mr) have equal energy % and the 
channel is ideal (no attenuation) with additive white gaussian noise of power 
spectral density This means that the received signal is r(/) = jc(r) 4- n(r), 
where x(r) is one of the ar,(r) and n(t) represents the noise. 

a Wither) = show that N, the dimensionality of the signal space, satisfies 

b Show that, in general, N^2n. 
c With M =2, show that, for general f(t) and /,(/), 


p (error | x x {t) sent) ^J 'J Vp( r f x,)p(r | x 2 ) dr 
* V 

where r, x,, and x 2 are the vector representations of r(/), r,(r), and j t,(/) in the 
A-dimensionai space. 
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d Using the result of (c), show that, for general M, 


^(error | x„{t) sent) 


2 

\ ' *. M 

m' i*w 


J- • -J Vp(r | x m )p { r | x„,) dr 

J R* 


e Show that 


J- ’ J Vp(r | \ m )p(x | x m .) dr = exp(- ^ 1 ) 


and, therefore. 


/?(error | x,„(t) sent) s X ex P ( ' 

1 *.m' M ' 


l*„, - x,„ 
*N a 



8 


BLOCK AND 
CONVOLUTIONAL 
CHANNEL CODES 


In Chapter 7, we treated channel coding and decoding from a general 
viewpoint, and showed thal even randomly selected codes on the average yield 
performances close to the capacity of a channel. In the case of orthogonal 
signals, we demonstrated that the channel capcity limit can be achieved as the 
number of signals approaches infinity. 

In this chapter, we describe specific codes and evaluate their performance 
for the additive white gaussian noise channel. In particular, we treat two 
classes of codes, namely, linear block codes and convolutional codes. The code 
performance is evaluated for both hard-decision decoding and soft-decision 
decoding. 


8-1 LINEAR BLOCK CODES 

A block code consists of a set of fixed-length vectors called code words. The 
length of a code word is the number of elements in the vector and is denoted 
by n. The elements of a code word are selected from an alphabet of q 
elements. When the alphabet consists of two elements, 0 and 1, the code is a 
binary code and the elements of any code word are called bits. When the 
elements of a code word are selected from an alphabet having q elements 
(q >2), the code is nonbinary. It is interesting to note that when q is a power 
of 2, i.e., q~ 2 h where b is a positive integer, each q-ary element has an 
equivalent binary representation consisting of b bits, and, thus, a nonbinary 
code of block length A can be mapped into a binary code of block length 
n — bN. 

There are 2 n possible code words in a binary block code of length n. From 
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these 2" code words, we may select M -2 k code words (k <n) to form a code. 
Thus, a block of k information bits is mapped into a code word of length n 
selected from the set of M — 2 k code words. We refer to the resulting block 
code as an (n, k) code, and the ratio kin = R c is defined to be the rate of the 
code. More generally, in a code having q elements, there are q " possible code 
words. A subset of M = 2 k code words may be selected to transmit Ac-bit blocks 
of information. 

Besides the code rate parameter R c , an important parameter of a code word 
is its weight, which is simply the number of nonzero elements that it contains. 
In general, each code word has its own weight. The set of all weights in a code 
constitutes the weight distribution of the code. When all the M code words 
have equal weight, the code is called a fixed-weight code or a constant -weight 
code. 

The encoding and decoding functions involve the arithmetic operations of 
addition and multiplication performed on code words. These arithmetic 
operations are performed according to the conventions of the algebraic field 
that has as its elements the symbols contained in the alphabet. For example, 
the symbols in a binary alphabet are 0 and 1; hence, the field has two elements. 
In general, a field F consists of a set of elements that has two arithmetic 
operations defined on its elements, namely, addition and multiplication, that 
satisfy the following properties (axioms). 


Addition 

1 The set F is closed wider addition, i.e., if a, b e F than a + b e F. 

2 Addition is associative, i.e., if a, b, and c are elements of F then 
a + (b + c) = (a + b) + c. 

3 Addition is commutative, i.e., a + b - b + a. 

4 The set contains an element called zero that satisfies the condition 
a 4 - 0 = a. 

5 Every element in the set has its own negative element. Hence, if b is an 
element, its negative is denoted by -b. The subtraction of two elements, such 
as a - b, is defined as a + (-6). 


Multiplication 

1 The set F is closed under multiplication, i.e., if a, b e F then ab e F. 

2 Multiplication is associative, i.e., a(bc ) = ( ab)c . 

3 Multiplication is commutative, i.e., ab = ba. 

4 Multiplication is distributive over addition, i.e., (a + b)c = ac + be. 

5 The set F contains an element, called the identity, that satisfies the 
condition a(l) = a, for any element a e F. 

6 Every element of F, except zero, has an inverse. Hence, if b e F (b * 0) 
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then its inverse is defined as b and bb ' = 1. The division of two elements, 
such as a 4- b, is defined as ab '* 

We are very familiar with the field of real numbers and the field of complex 
numbers. These fields have an infinite number of elements. However, as 
indicated above, codes are constructed from fields with a finite number of 
elements. A finite field with q elements is generally called a Galois field and 
denoted by GF(^). 

Every field must have a zero element and a one element. Hence, the 
simplest field is GF(2). In general, when q is a prime, we can construct the 
finite field GF(g) consisting of the elements {0, 1, .... q - 1}. The addition and 
multiplication operations on the elements of GF(g) are defined modulo q and 
denoted as (modrj). For example, the addition and multiplication tables for 
GF(2) are 


+ 

0 

1 


0 

1 

0 

0 

1 

0 

0 

~0 

1 

1 

0 

1 

0 

1 


which are operations (mod 2). Similarly, the field GF(5) is a set consisting of 
the elements {0, 1, 2, 3, 4}. The addition and multiplication tables for GF(5) are 


+ 

0 

1 

2 

3 

4 


0 

i 

2 

3 

4 

0 

0 

1 

2 

3 

4 

0 

0 

0 

0 

0 

0 

1 

1 

2 

3 

4 

0 

1 

0 

1 

2 

3 

4 

2 

2 

3 

4 

0 

1 

2 

0 

2 

4 

1 

3 

3 

3 

4 

0 

1 

2 

3 

0 

3 

1 

4 

2 

4 

4 

0 

1 

2 

3 

4 

! o 

4 

3 

2 

1 


In general, the finite field GF(<?) can be constructed only if q is a prime or a 
power of a prime. When q is a prime, multiplication and addition are based on 
modulo-*? arithmetic as illustrated above. If q = p m where p is a prime and m is 
any positive integer, it is possible to extend the field GF(p) to the field 
GF(p'"). This is called the extension field of GF(p). Multiplication and 
addition of the elements in the extension field are based on modulo-p 
arithmetic. 

With this brief introduction to the arithmetic operations that may be 
performed on the elements of code words, let us now consider some basic 
characteristics of block codes. 

Suppose C, and C, are any two code words in an (n, k) block code. A 
measure of the difference between the code words is the number of 
corresponding elements or positions in which they differ. This measure is called 
the Hamming distance between the two code words and is denoted as d ir 
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Clearly, d tj for i ^ j satisfies the condition 0<dv*n. The smallest value of the 
set {dj,} for the M code words is called the minimum distance of the code and is 
denoted as d min . Since the Hamming distance is a measure of the separation 
between pairs of code words, it is intimately related to the cross-correlation 
coefficient between corresponding pairs of waveforms generated from the code 
words. The relationship is discussed in Section 8-1-4, 

Besides characterizing a code as being binary or nonbinary, one can also 
describe it as either linear or nonlinear. Suppose C, and C, are two code words 
in an (n, k) block code and let a, and a 2 be any two elements selected from 
the alphabet. Then the code js said to be linear if and only if «,C, + nr 2 C ; is 
also a code word. This definition implies that a linear code must contain the 
all-zero code word. Consequently a constant-weight code is nonlinear. 

Suppose we have a binary linear block code, and let C,, i = 1,2 M, 

denote the M code words. For convenience, let C, denote the all-zero code 
word, i.e., Ci = [0 0 . . . 0], and let w r denote the weight of the rth code word. It 
follows that the Hamming distance between the code words C, and C,. 
Thus, the distance d lr = w r . In general, the distance d t/ between any pair of 
code words C, and C, is simply equal to the weight of the code word formed by 
taking the difference between C, and C r Since the code is linear, the difference 
(equivalent to taking the modulo-2 sum for a binary code) between C, and C, is 
also a code word having a weight included in the set {w r }. Hence, the weight 
distribution of a linear code completely characterizes the distance properties of 
the code. The minimum distance of the code is, therefore, 

^ m in — min {w r } (8-1-1) 

r.ri* 1 

A number of elementary concepts from linear algebra are particularly useful 
in dealing with linear block codes. Specifically, the set of all /r-tuples (vectors 
with n elements) form a vector space 5. If we select a set of k < n linearly 
independent vectors from 5 and from these construct the set of all linear 
combinations of these vectors, the resulting set forms a subspace of S, say S t ., 
of dimension k. Any set of k linearly independent vectors in the subspace 5, 
constitutes a basis. Now consider the set of vectors in S that are orthogonal to 
every vector in a basis for S c (and, hence, orthogonal to all vectors in S c ). This 
set of vectors is also a subspace of S and is called the null space of S c . If the 
dimension of S c is k, the dimension of the null space is n - k. 

Expressed in terms appropriate for binary block codes, the vector space S 
consists of the 2 n binary valued n-tuples. The linear ( n , k) code is a set of 2 k 
n-tuples called code words, which forms a subspace S c over the field of two 
elements. Since there are 2 k code words in S„ a basis for S c has k code words. 
That is, k linearly independent code words are required to construct 2 k linear 
combinations, thus generating the entire code. The null space of S c is another 
linear code, which consists of 2 n ~ k code words of block length n and n - k 
information bits. Its dimension is n - k. In Section 8-1-1, we consider these 
relationships in greater detail. 



CHAK7F.R K: BLOCK AND CONVOLUTIONAL CHANNEL CODES 417 


8-1-1 The Generator Matrix and the Parity Check Matrix 

Let x mX , x,„ 2 , . . ■ ,x mk denote the k information bits encoded into the code 
word C m . Throughout this chapter, we follow the established convention in 
coding of representing code words as row vectors. Thus, the vector of k 
information bits into the encoder is denoted by 

~ [-L.1 ! X m 2 - - ■ X mk \ 
and the output of the encoder is the vector 


\Cm ] 2 • ■ • C mn \ 

The encoding operation performed in a linear binary block encoder can be 
represented by a set of n equations of the form 


rrif ~t" Xm2§2/ T . . . "t“ Xmk&kjj 1 1, 2, . . . , rt (8-1-2) 

where g h = 0 or 1 and x nu g,, represents the product of x mi and g, r The linear 
equations (8-1-2) may also be represented in a matrix form as ■ 


C m = X m G 

where G, called the generator matrix of the code, is 





gn 

g 12 • 

• gin 

G = 


= 

gn 

gl 2 ■ 

* &2n 


_ ■ t ~g*— > _ 


-gkl 

gkl • 

Skn — 


(8-1-3) 


(8-1-4) 


Note that any code word is simply a linear combination of the vectors {g,} of G, 
i.e., 


ki-m X m jg] X /n 2%2 ■ ■ ■ T X 


(8-1-5) 


Since the linear ( n , k ) code with 2* code words is a subspace of dimension k, 
the row vectors {g,} of the generator matrix G must be linearly independent, 
i.e., they must span a subspace of k dimensions. In other words, the {gj must 
be a basis for the (n, k) code. We note that the set of basis vectors is not 
unique, and, hence, G is not unique. We also note that, since the subspace has 
dimension k, the rank of G is k. 

Any generator matrix of an ( n , k) code can be reduced by row operations 
(and column permutations) to the “systematic form.” 



"l 

0 

0 ... 

0 

P 11 

P 12 • 

1 — 

-* 

c 

£ 

11 

ST 

II 

O 

0 

1 

0 ... 

0 

P21 

P22 ■ 

■ Plr, -k 


_0 

0 

0 ... 

1 

Pkl 

Pkl ■ 

Pkn k - 


where 1* is the k x k identity matrix and P is a kx(n~k) matrix that 



418 DIGITAL (OMMDMt ALIGNS 


determines the n - k redundant bits or parity check bits. Note that a generator 
matrix of the systematic form generates a linear block code in which the first k 
bits of each code word are identical to the information bits to be transmitted, 
and the remaining n - k bits of each code word are linear combinations of the 
k information bits. These n - k redundant bits are called parity check bits. The 
resulting (n, k) code is called a systematic code. 

An (rt. k) code generated by a generator matrix that is not in the systematic 
form (8-1-6) is called nonsystematic. However, such a generator matrix is 
equivalent to a generator matrix of the systematic form in the sense that one 
can be obtained from the other by elementary row operations and column 
permutations. The two ( n , k) linear codes generated by the two equivalent 
generator matrices are said to be equivalent , and one can be obtained from the 
other by a permutation of the places of every element. Thus, every linear 
(n, k) code is equivalent to a linear systematic (n, k) code. 


Example 8-1-1 

Consider a (7.4) code with generator matrix 


"1 

0 

0 

0 

1 

0 

r 

0 

1 

0 

0 

1 

1 

i 

0 

0 

1 

0 

1 

1 

0 

_0 

0 

(J 

1 

0 

1 

i _ 


G = 


A typical code word may be expressed as 

f-V \ -V tn 2 X III 3 -Cm 4 


[h 


( 8 - 1 - 7 ) 


where the {*„„} represents the four information bits and the {c m ,} represent 
the three parity check bits given by 


c m 


nl l 


<n,2 + -V, 




X III to X in 2 X/i i.l "t" X m 4 (8-1*8) 

C„,7 “ X m 1 " r X,„2 +■ -X„,4 

A linear systematic (n, k) binary block encoder raay be implemented by 
using a /c -bit shift register and n-k modulo-2 adders tied to the appropriate 
stages of the shift register. The n-k adders generate the parity check bits, 
which are subsequently stored temporarily in a second shift register of length 
n-k. The A: -bit block of information bits shifted into the Ac -bit shift register 
and the n-k parity check bits are computed. Then the k information bits 
followed by the n - k parity check bits are shifted out of the two shift registers 
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FIGURE 8-t-I A linear shift register for generating a (7,4) binary 
code. 



and fed to the modulator. This encoding is illustrated in Fig. 8-1-1 for the (7.4) 
code of Example 8-1-1. 

Associated with any linear (n, k) code is the dual code of dimension n - k. 
The dual code is a linear (n, n — k) code with 2"~* code vectors, which is the 
null space of the (n, k) code. The generator matrix for the dual code, denoted 
by H, consists of n - k linearly independent code vectors selected from the null 
space. Any code word C„, of the (/?, k ) code is orthogonal to any code word in 
the dual code. Hence, any code word of the ( n , k) code is orthogonal to every 
row of the matrix H, i.e., 

C,„H' = 0 (8-1-9) 

where 0 denotes an all-zero row vector with n — k elements, and C„, is a code 
word of the (n, k ) code. Since (8-1-9) holds for every code word of the ( n , k) 
code, it follows that 

GH' = 0 (8-1-10) 


where 0 is now a k x (n - k) matrix with all-zero elements. 

Now' suppose that the linear («, k) code is systematic and its generator 
matrix G is given by the systematic form (8-1-6). Then, since GH' = 0. it 
follows that 

H = [-P' ; I„~*] (8-1-11) 

The negative sign in (8-1-11) may be dropped when dealing with binary codes, 
since modulo-2 subtraction is identical to modulo-2 addition. 


Example 8-1-2 

For the systematic (7,4) code generated by matrix G given by (8-1-7), we 
have, according to (8-1-11), the matrix H in the form 



"l 

1 

1 

0 

1 

0 

o" 

H - 

0 

1 

1 

1 

0 

1 

0 


_1 

1 

0 

1 

0 

0 

1_ 


( 8 - 1 - 12 ) 
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Now, the product C m H' yields the three equations 

Xml X,„2 X-,; ' ^m5 0 

x m i + Jf.»a + X m j + c„, h = 0 

x m\ + X„,2 + ,C,„ 4 + C ml = 0 


(8-1-13) 


Thus, we observe that the product C,„H' is equivalent to adding the parity 
check bits to the corresponding linear combinations of the information bits 
used to compute c„, r j = 5, 6, 7. That is, (8-1-13) are equivalent to (8-1-8). 
The matrix H may be used by the decoder to check that a received code 
word Y satisfies the condition (8-1-13), i.e., YH'=0. In so doing, the 
decoder checks the received parity check bits with the corresponding linear 
combination of the bits y,, y 2 , y 3 , and y 4 that formed the parity check bits at 
the transmitter. It is, therefore, appropriate to call H the parity check matrix 
associated with the (n, k ) code. 

We make the following observation regarding the relation of the minimum 
distance of a code to its parity check matrix H. The product C„,H’ with C m ¥■ 0 
represents a linear combination of the n columns of H'. Since C m H' =0, the 
column vectors of H are linearly dependent. Suppose C ) denotes the minimum 
weight code word of a linear (n, k) code. It must satisfy the condition C,H' = 0. 
Since the minimum weight is equal to the minimum distance, it follows that 
dmin of the columns of H are linearly dependent. Alternatively, we may say 
that no more than d mm - 1 columns of H are linearly independent. Since the 
rank of H is at most n — k. we have n — k ^d mm — 1. Therefore, d mm is 
upper-bounded as 

d min ^n-k+ (8-1-14) 

Given a linear binary (n, k) code with minimum distance d„, in < we can 
construct a linear binary (n + 1, k) code by appending one additional parity 
check bit to each code word. The check bit is usually selected to be a check bit 
on all the bits in the code word. Thus the added check bit is a 0 if the original 
code word has an even number of Is and it is a 1 if the code word has an odd 
number of Is. Consequently, if the minimum weight and, hence, the minimum 
distance of the code is odd, the added parity check bit increases the minimum 
distance by 1. We call the (n + 1, k) code an extended code. Its parity check 
matrix is 


(8-1-15) 




0 



0 

H e = 

H 

0 


1 1 1 ... 1 

1 


where H is the parity check matrix of the original code. 
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A systematic ( n , k ) code can also be shortened by setting a number of the 
information bits to zero. That is, a linear ( n , k) code consisting of k 
information bits and n - k check bits can be shortened into a (n - l,k - l) 
linear code by setting the first / bits to zero. These l bits are not transmitted. 
The n - k check bits are computed in the usual manner, as in the original code. 
Since 

C m =X m G 

the effect of setting the first l bits of X m to 0 is equivalent to reducing the 
number of rows of G by removing the first / rows. Equivalently, since 

C m H = 0 

we may remove the first l columns of H, The shortened (n ~ l, k — l) code 
consists of 2*"' code words. The minimum distance of these 2 k > code words is 
at least as large as the minimum distance of the original (n, k) code. 


8-1-2 Some Specific Linear Block Codes 

In this subsection, we shall briefly describe three types of linear block codes 
that are frequently encountered in practice and list their important parameters. 

Hamming Codes There are both binary and nonbinary Hamming codes. 
We limit our discussion to the properties of binary Hamming codes. These 
comprise a class of codes with the property that 

(n, k) = (2 m — 1, 2"’ — 1 — m) (8-1-16) 

where m is any positive integer. For example, if m = 3, we have a (7, 4) code. 

The parity check matrix H of a Hamming code has a special property that 
allows us to describe the code rather easily. Recall that the parity check matrix 
of an ( n,k ) code has n-k rows and n columns. For the binary (n, k) 
Hamming code, the n — 2"' — 1 columns consist of all possible binary vectors 
with n-k=m elements, except the all-zero vector. For example, the (7,4) 
code considered in Examples 8-1-1 and 8-1-2 is a Hamming code. Its parity 
check matrix consists of the seven column vectors (001), (010), (Oil), (100) 
(101), (110), (111). 

« 

If we desire to generate a systematic Hamming code, the parity check 
matrix H can be easily arranged in the systematic form (8-1-11). Then the 
corresponding generator matrix G can be obtained from (8-1-11). 

We make the observation that no two columns of H are linearly dependent, 
for otherwise the two columns would be identical. However, for m > 1, it is 
possible to find three columns of H that add to zero. Consequently, d mm = 3 for 
an ( n,k ) Hamming code. 

By adding an overall parity bit, a Hamming («, k) code can be modified to 
yield an (n + l, k) code with c7 mir = 4. On the other hand, an ( n , k) Hamming 
code may be shortened to (n — I, k - 1) by removing l rows of its generator 
matrix G or, equivalently, by removing l columns of its parity check matrix H. 
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The weight distribution for the class of Hamming ( n , k) codes is known and 
is expressed in compact form by the weight enumerating polynomial 

/iu) = i/u' 

i — 0 

= — 1(1 + zT + «( 1 + z) ,H - ,)/2 (l - z) ( " + i)/2 l (8-1-17) 

n + 1 

where A, is the number of code words of weight i. 


Hadamard Codes A Hadamard code is obtained by selecting as code 
words the rows of a Hadamard matrix. A Hadamard matrix M„ is an n x n 
matrix (n an even integer) of Is and Os with the property that any row differs 
from any other row in exactly l z n positions.t One row of the matrix contains all 
zeros. The other rows contain \n zeros and { n ones. 

For n - 2, the Hadamard matrix is 


M 2 = 


0 0 
.0 1 . 


(8-1-18) 


Furthermore, from M„, we can generate the Hadamard matrix M 2 „ according 
to the relation 


M 


2 n 


M 

M 


n 


(8-1-19) 


where M„ denotes the complement (Os replaced by Is and vice versa) of M„. 
Thus, by substituting (8-1-18) into (8-1-19), we obtain 


ro 


m 4 = 


0 

0 

0 


0 0 0" 
1 0 I 
0 1 1 
1 1 0 


( 8 - 1 - 20 ) 


The complement of M4 is 


M 4 = 


1 

1 

1 

1 


1 1 
0 1 
1 0 
0 0 


1 

0 

0 

1 


( 8 - 1 - 21 ) 


Now the rows of M 4 and M 4 form a linear binary code of block length n = 4 
having 2n = 8 code words. The minimum distance of the code is d min ~\n-2. 

By repeated application of (8-1-19), we can generate Hadamard codes with 
block length n=2"', k = log 2 2 n = log 2 l m + 1 = m + 1 , and d mi „ = = 2 m ~ ' , 

where m is a positive integer. In addition to the important special case where 
n — 2 m , Hadamard codes of other block lengths are possible, but the codes are 
not linear. 


t Sometimes the elements of the Hadamard matrix are denoted by - 1 and -1. Then the rows 
of the Hadamard matrix are mutually orthogonal. We also note that the M = 2* signal waveforms, 
constructed from Hadamard code words by mapping each bit in a code word into a binary PSK 
signal, are orthogonal. 
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TABLE 8-1-1 WEIGHT DISTRIBUTION OF GOLAY 

(23,12) AND EXTENDED GOLAY (24,12) 
CODES 


Weight 

Number of code words 

(23, 12) code 

(24, 12) code 

0 

1 

1 

7 

253 

0 

8 

506 

759 

11 

1288 

0 

12 

1288 

2576 

15 

506 

0 

16 

253 

759 

23 

1 

t) 

24 

0 

1 


Source: Pelerson and Weldon (1972!. 


Golay Code The Golay code is a binary linear (23, 12) code with d mm - 7. 
The extended Golay code obtained by adding an overall parity to the (23, 12) 
is a binary linear (24,12) code with d min = 8."' Table 8-1-1 lists the weight 
distribution of the code words in the Golay (23, 12) and the extended Golay 
(24, 12) codes. We discuss the generation of the Golay code in Section 8-1-3. 

8-1-3 Cyclic Codes 

Cyclic codes are a subset of the class of linear codes that satisfy the following 
cyclic shift property: if C = [c„_ ] c„_ 2 . . . c,c 0 ) is a code word of a cyclic code 
then [c„_ 2 c„- 3 . . . c 0 c n _,J, obtained by a cyclic shift of the elements of C, is 
also a code word. That is, all cyclic shifts of C are code words. As a 
consequence of the cyclic property, the codes possess a considerable amount of 
structure which can be exploited in the encoding and decoding operations. A 
number of efficient encoding and hard-decision decoding algorithms have been 
devised for cyclic codes that make it possible to implement long block codes 
with a large number of code words in practical communications systems. A 
description of specific algorithms is beyond the scope of this book. Our 
primary objective is to briefly describe a number of characteristics of cyclic 
codes. 

In dealing with cyclic codes, it is convenient to associate with a code word 
c= [c„ . j c n . 2 . . . eye,,] a polynomial C(p ) of degree - 1, defined as 

C(P) = G, - ip"' 1 + c„_ 2 p"' 2 + . . . + c,p + c 0 (8-1-22) 

For a binary code, each of the coefficients of the polynomial is either zero or 
one. 

Now suppose we form the polynomial 

pC(p) = c„-,p" + c„_ 2 p" 1 + . . . -cp 2 + c„p 
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This polynomial cannot represent a code word, since its degree may be equal 
to n (when c„_, = 1). However, if we divide pC(p ) by p n + 1, we obtain 


where 


pC{p) C } (p) 

p n + 1 C "~' p" + 1 

Ciip) = c„- 2 p n ~' +c n -^p n2 + ... +c 0 p +c„-i 


(8-1-23) 


Note that the polynomial Ci(p) represents the code word Ci = 
[c„ - 2 ' ' ' c 0 c„_,], which is just the code word C shifted cyclicly by one position. 
Since C,(p) is the remainder obtained by dividing pC{p) by p" + 1 , we say 
that 

Ci(p) = pC{p) mod(p , ’ + l) (8-1-24) 


In a similar manner, if C(p) represents a code word in a cyclic code then 
p'C(p) mod ( p n + 1) is also a code word of the cyclic code. Thus we may write 

p‘C{p) = Q(p)(p n + 1)4- Qip) (8-1-25) 

where the remainder polynomial C,(p) represents a code word of the cyclic 
code and Q(p) is the quotient. 

We can generate a cyclic code by using a generator polynomial g(p) of 
degree n - k. The generator polynomial of an ( n , k) cyclic code is a factor of 
p n + 1 and has the general form 

gip) = p"“* +. . . +g lP + 1 (8-1-26) 

We also define a message polynomial X{p) as 

X(p) = x k ~ l p k -' +x*_ 2 p*' 2 + . . . +x x p + x 0 (8-1-27) 

where [** ix*_ 2 ' ' ■ * 1 * 0 ] represent the k information bits. Clearly, the product 
X(p)g(p) is a polynomial of degree less than or equal to n - 1, which may 
represent a code word. We note that there are 2* polynomials {J\T,(p)}, and, 
hence, there are 2* possible code words that can be formed from a given g(p). 
Suppose we denote these code words as 

C m (p)= X m (p)gip), m =1.2, ... ,2* (8-1-28) 

To show that the code words in (8-1-28) satisfy the cyclic property, consider 
any code word C(p) in (8-1-28). A cyclic shift of C(p) produces 

C,(p) = pC(p) + c n *,(p" + 1) (8-1-29) 

and, since g(p) divides both p n + 1 and C{p), it also divides Ci{p), i.e., C,(p) 
can be represented as 

C l (p) = Xi(p)g(p) 

Therefore, a cyclic shift of any code word C(p) generated by (8-1-28) yields 
another code word. 

From the above, we see that code words possessing the cyclic property can 



CHAPTER X block and convolutional channel codes 425 


be generated by multiplying the 2* message polynomials with a unique 
polynomial g(p), called the generator polynomial of the (n, k) cyclic code, 
which divides p n + 1 and has degree n - k. The cyclic code generated in this 
manner is a subspace 5 ( of the vector space S. The dimension of S ( is k. 


Example 8-1-3 

Consider a code with block length n = 7. The polynomial p 1 + 1 has the 
following factors: 

p 1 + 1 = (p + l)(p 3 +p 2 + 1 )(p 3 +p + 1) (8-1-30) 

To generate a (7, 4) cyclic code, we may take as a generator polynomial one 
of the following two polynomials: 


8i(p) = p 3 + P 2 + 1 

Slip) ~ P 3 +P + 1 


( 8 - 1 - 31 ) 


The codes generated by g,(p) and g 2 (p) are equivalent. The code words in 
the (7, 4) code generated by g ( (p) =?p 3 + p 2 -f 1 are given in Table 8-1-2. 


In general, the polynomial p n + 1 may be factored as 

p n + 1 ^g(p)Kp) 

where g(p ) denotes the generator polynomial for the ( n , k ) cyclic code and 


TABLE 8-1-2 (7,4) CYCLIC CODE 

Generator Polynomial: g,(p) = p s + p 2 + 1 



Information bits 



Code words 



1 

p 

P 2 

p' 

p" 

P b 


P 4 

p 1 

P 2 

p' 

p° 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0 

0 

0 

1 

( 

0 

! 

0 

0 

] 

0 

0 

0 

) 

1 

0 

I 

0 

0 

0 

1 

1 

0 

0 

l 

0 

1 

1 

1 

0 

1 

0 

0 

0 

l 

1 

0 

I 

0 

0 

0 

1 

0 

i 

0 

{ 

1 

I 

0 

0 

i 

0 

! 

1 

0 

0 

] 

0 

1 

1 

2 

0 

0 

1 

1 

1 

0 

1 

0 

0 

0 

1 

1 

1 

0 

0 

0 

1 

1 

0 

1 

0 

0 

0 

1 

0 

0 

1 

t 

t 

0 

0 

1 

0 

l 

1 

0 

1 

0 

2 

2 

2 

0 

0 

2 

0 

1 

0 

l 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

0 

0 

1 

0 

1 

1 

1 

0 

0 

1 

1 

0 

1 

1 

0 

1 

0 

0 

0 

1 

1 

1 

1 

0 

1 

0 

0 

0 

I 

1 

0 

1 

l 

1 

1 

1 

0 

0 

1 

0 

1 

1 
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h(p ) denotes the parity polynomial that has degree k. The latter may be used 
to generate the dual code. 

For this purpose, we define the reciprocal polynomial of h(p) as 

p*h(p-')=p k (p- k + V.P~* +I + h k _ lP ~ k+2 + . . . + h lP -' + 1) 

= 1 + h k ^ 2 p 2 + . . . +p k (8-1-32) 

Clearly, the reciprocal polynomial is also a factor of p n + 1. Hence, p k h(p~ l ) is 
the generator polynomial of an (n, n — k) cyclic code. This cyclic code is the 
dual code to the («, k ) code generated from g(p). Thus, the ( n , n - k) dual 
code constitutes the null space of the (n, k) cyclic code. 


Example 8-1-4 

Let us consider the dual code to the (7, 4) cyclic code generated in Example 
8-1-3. This dual code is a (7, 3) cyclic code associated with the parity 
polynomial 

Mp) = (p + i)(p 3 + p + 1) 

= P 4 +P 3 +P 2 + 1 (8-1-33) 

The reciprocal polynomial is 

P 4 MP~') = 1 +P +P 2 + p* 

This polynomial generates the (7,3) dual code given in Table 8-1-3. The 
reader can verify that the code words in the (7,3) dual code are orthogonal 
to the code words in the (7.4) cyclic code of Example 8-1-3. Note that 
neither the (7,4) nor the (7, 3) codes are systematic. 

It is desirable to show how a generator matrix can be obtained from the 
generator polynomial of a cyclic (n, k) code. As previously indicated, the 
generator matrix for an (n, k) code can be constructed from any set of k 


TABLE 8-1-3 (7,3) DUAL CODE 

Generator Polynomial p 4 h t (p ') - p* + p 2 +p + ] 


Information bits Code words 


P z 

p' 


p * 

/> 5 

p 4 

p 3 

P 2 

p' 

p" 

0 

0 

0 

0 

!) 
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0 

0 

0 

0 

0 

0 

1 

0 

0 

1 

1) 

1 

! 

1 

0 

1 

0 

0 

1 

0 

1 

1 

1 

0 

0 

1 

1 

0 

] 

1 

1 

0 

0 

1 

1 

0 

0 

1 

0 

! 

1 

1 

0 

0 

1 

0 

1 

1 

0 

0 

1 

1) 

1 

I 

1 

1 

0 

1 

t 

1 

0 

0 

1 

0 

1 

1 

I 

1 

1 

0 

0 

1 

0 

1 
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linearly independent code words. Hence, given the generator polynomial g(p), 
an easily generated set of k linearly independent code words is the code words 
corresponding to the set of k linearly independent polynomials 

P k ~ ] g(p), P k ~ 2 g(p), pg(p), giP ) 

Since any polynomial of degree less than or equal to n - 1 and divisible by 
g(p) can be expressed as a linear combination of this set of polynomials, the 
set forms a basis of dimension k. Consequently, the code words associated with 
these polynomials form a basis of dimension k for the (n, k) cyclic code. 


Example 8-1-5 

The four rows of the generator matrix for the (7,4) cyclic code with 
generator polynomial g,(p) = p 3 + p 2 + 1 are obtained from the polynomials 

P i gi(p)=P^ t + P 2+i + p i , i = 3,2, 1,0 
It is easy to see that the generator matrix is 


"1 1 0 1 0 0 0 " 

0 110 10 0 

0 0 110 10 

_0 0 0 1 1 0 1 _ 


(8-1-34) 


Similarly, the generator matrix for the (7,4) cyclic code generated by the 
polynomial g 2 (p) = p 3 + p + 1 is 


G 2 = 


1 0 1 1 0 0 0 " 
0 10 110 0 
0 0 10 1 10 
.0 0 0 1 0 1 1 


(8-1-35) 


The parity check matrices corresponding to Gj and G 2 can be constructed in 
the same manner by using the respective reciprocal polynomials (Problem 
8-8). 


Note that the generator matrix obtained by this construction is not in 
systematic form. We can construct the generator matrix of a cyclic code in the 
systematic form G = [I* ; P] from the generator polynomial as follows. First, 
we observe that the /th row of G corresponds to a polynomial of the form 
P n 1 + Up)> 1 = 1,2, ... ,k, where R,(p ) is a polynomial of degree less than 
n - k. This form can be obtained by dividing p n ~‘ by g(p). Thus, we have 


g(p) 


= Qiip) + 


Up) 

g(p) ’ 


I = 1,2, .... k 


or, equivalently, 

P n ~‘ = Qi(p)g{p) + Up), 1 = 1,2, .... k 


(8-1-36) 
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where Qi(p ) is the quotient. But p"' 1 4- R t (p) is a code word of the cyclic code 
since p"~‘ + R t (p) = Q/(p)g(p). Therefore the desired polynomial correspond- 
ing to the /th row of G is p" '+ R,(p). 


Example 8-1-6 

For the (7,4) cyclic code with generator polynomial g 2 (p) —p* +p + 1, 
previously discussed in Example 8-1-5, we have 

P 6: =(P 3 +P + ^)gz(p) + P 1 + 1 
P 5 ~ ip 2 + l)g 2 (p) +P 2 +P + 1 
P^Pgzip) +/> 2 +P 
P*=g 2 (p) + P + 1 

Hence, the generator matrix of the code in systematic form is 


"1 0 0 0 1 0 1 " 

0 10 0 111 

0 0 1 0 l 10 

_0 0 0 1 0 1 1_ 


and the corresponding parity check matrix is 


H 2 


1110 10 0 
0 1110 10 
110 10 0 1 


(8-1-37) 


(8-1-38) 


It is left as an exercise for the reader to demonstrate that the generator 
matrix G 2 given by (8-1-35) and the systematic form given by (8-1-37) 
generate the same set of code words (Problem 8-2). 


The method for constructing the generator matrix G in systematic form 
according to (8-1-36) also implies that a systematic code can be generated 
directly from the generator polynomial g{p). Suppose that we multiply the 
message polynomial X{p) by p n ~ k . Thus, we obtain 


P n ~ k X(p) = x k -iP"- 1 +x k _ 2 p"~ 2 + . . . +x lP n ~ k + l +x oP n ~ k 

In a systematic code, this polynomial represents the first k bits in the code 
word C(p). To this polynomial we must add a polynomial of degree less than 
rt-k representing the parity check bits. Now, if p nk X{p) is divided by g{p), 
the result is 


P n ~ k X{p) 

g(p) 


= Q(p) + 


r(p) 

g(p) 


or, equivalently, 


p" k X(p) = Q(p)g{p) + r{p) 


(8-1-39) 
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where rip) has degree less than n - k. Clearly. Q{p)g(p) is a code word of the 
cyclic code. Hence, by adding (modulo-2) r(p) to both sides of (8-1-39), we 
obtain the desired systematic code. 

To summarize, the systematic code may be generated by 

1 multiplying the message polynomial X(p) by p"' k : 

2 dividing p" k X(p) by g(p) to obtain the remainder r(p); and 

3 adding r(p) to p" k X{p). 

Below we demonstrate how these computations can be performed by using 
shift registers with feedback. 

Since p" + 1 = g(p)h(p ) or, equivalently, g(p)/r(p) = 0 mod(p" + 1), we 
say that the polynomials g(p) and h(p) are orthogonal. Furthermore, the 
polynomials p'g(p) and p'h(p) are also orthogonal for all i and j. However, the 
vectors corresponding to the polynomials g(p) and h(p) are orthogonal only if 
the ordered elements of one of these vectors are reversed. The same statement 
applies to the vectors corresponding to p'g(p) and pOi(p). In fact, if the parity 
polynomial h(p) is used as a generator for the (n, n - k) dual code, the set of 
code words obtained just comprises the same code words generated by the 
reciprocal polynomial except that the code vectors are reversed. This implies 
that the generator matrix for the dual code obtained from the reciprocal 
polynomial p k h(p ') can also be obtained indirectly from h(p). Since the 
parity check matrix H for the ( n , k) cyclic code is the generator matrix for the 
dual code, it follows that H can also be obtained from h{p). The following 
example illustrates these relationships. 


Example 8-1-7 

The dual code to the (7,4) cyclic code generated by g,(p) = p y + p 2 + 1 is 
the (7,3) dual code that is generated by the reciprocal polynomial 
P 4f hiP ‘)~P 4 + p 2 +p + 1. However, we may also use h,(p) to obtain the 
generator matrix for the dual code. Then, the matrix corresponding to the 
polynomials p7i,(p), i = 2. 1, 0, is 


G;,i — 


1 1 1 
0 1 1 
0 0 1 


0 10 0 
10 10 
110 1 


The generator matrix for the (7,3) dual code, which is the parity check 
matrix for the (7,4) cyclic code, consists of the rows of G*, taken in reverse 
order. Thus, 


Hi = 


0 0 1 
0 1 0 
1 0 1 


0 111 
1110 
110 0 
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FIGURE 8-1-2 


The reader may verify that G)HI =0. 

Note that the column vectors of H, consist of all seven binary vectors of 
length 3, except the all-zero vector. But this is just the description of the 
parity check matrix for a (7, 4) Hamming code. Therefore, the (7, 4) cyclic 
code is equivalent to the (7,4) Hamming code discussed previously in 
Examples 8-1-1 and 8-1-2. 

Encoders for Cyclic Codes The encoding operations for generating a 
cyclic code may be performed by a linear feedback shift register based on the 
use of either the generator polynomial or the parity polynomial. First, let us 
consider the use of g(p). 

As indicated above, the generation of a systematic cyclic code involves three 
steps, namely multiplying the message polynomial X(p ) by p"~ k , dividing the 
product by g(p), and, finally, adding the remainder to p n ~ k X(p). Of these 
three steps, only the division is nontrivial. 

The division of the polynomial A(p) -p n ~ k X{p) of degree n - 1 by the 
polynomial 

SiP) =gn-kP n ~ k + gn-k~ l p n ~ k ~' + • • ■ + glP + go 

may be accomplished by the (n - k) stage feedback shift register illustrated in 
Fig. 8-1-2. Initially, the shift register contains all zeros. The coefficients of A(p) 
are clocked into the shift register one (bit) coefficient at a time, beginning with 
the higher-order coefficients, i.e., with a n - u followed by a„- 2 , and so on. After 
the /cth shift, the first nonzero output of the quotient is q 1 =g„~ k a n . 
Subsequent outputs are generated as illustrated in Fig. 8-1-2. For each output 
coefficient in the quotient, we must subtract the polynomial g(p) multiplied by 
that coefficient, as in ordinary long division. This subtraction is performed by 
means of the feedback part of the shift register. Thus, the feedback shift 
register in Fig. 8-1-2 performs division of two polynomials. 

In our case, g n - k =g 0 = 1, and, for binary codes, the arithmetic operations 
are performed in modulo-2 arithmetic. Consequently, the subtraction opera- 
tions reduce to modulo-2 addition. Furthermore, we are only interested in 


A feedback shift register for dividing the polynomial A(p) by g(p). 
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FIGURE 8-1-3 Encoding of a cyclic code by use of the generator polynomial g(p). 


generating the parity check bits for each code word, since the code is 
systematic. Consequently, the encoder for the cyclic code • takes the form 
illustrated in Fig. 8-1-3. The first k bits at the output of the encoder are simply 
the k information bits. These k bits are also clocked simultaneously into the 
shift register, since the switch 1 is in the closed position. Note that the 
polynomial multiplication of p n ~ k with X(p) is not performed explicitly. After 
the k information bits are all clocked into the encoder, the positions of the two 
switches are reversed. At this time, the contents of the shift register are simply 
the n~ k parity check bits, which correspond to the coefficients of the 
remainder polynomial. These n - k bits are clocked out one at a time and sent 
to the modulator. 


Example 8-1-8 

The shift register for encoding the (7,4) cyclic code with generator 
polynomial g(p ) = p i + p + 1 is illustrated in Fig. 8-1-4. Suppose the input 


FIGURE 8-1-4 


The encoder for the (7, 4) cyclic code with 
generator polynomial g(p )- p 3 + p El. 
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FIGURE 8-1-5 


FIGURE 6-1-6 




Encoder for an (n, k ) cyclic code based on parity polynomial h(p). 


message bits are 0110. The contents of the shift register are as follows: 
Input Shift Shift register contents 


0 0 0 0 
0 1 0 0 0 

1 2 110 

1 3 10 1 

0 4 10 0 

Hence, the three parity check bits are 100, which correspond to the code 
bits c 5 = 0, c 6 = 0, and c 7 = 1. 


Instead of using the generator polynomial, we may implement the encoder 
for the cyclic code by making use of the parity polynomial 

Hp) = P k + h k -,p k] + . . . + h\p + 1 

The encoder is shown in Fig. 8-1-5. Initially, the k information bits are shifted 
into the shift register and simultaneously fed to the modulator. After all k 
information bits are in the shift register, the switch is thrown into position 2 
and the shift register is clocked n — k times to generate the n — k parity check 
bits as illustrated in Fig. 8-1-5. 

Example 8-1-9 

The parity polynomial for the (7, 4) cyclic code generated by g(p) = 
p 3 + P + 1 is h(p ) =p 4 + p 2 +p + 1. The encoder for this code based on the 
parity polynomial is illustrated in Fig. 8-1-6. If the input to the encoder is 


The encoder for the (7, 4) cyclic code based on the parity polynomial h(p) = p* + p 2 + 1. 


Output 

OMOOOJ 
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FIGURE 8-1-7 


the message bits 0110. the parity check bits are c 5 =0. c 6 = 0, and c 7 = 1, as 
is easily verified. 

It should be noted that the encoder based on the generator polynomial is 
simpler when n - k < k (k > j«), i.e., for high rate codes {/?,. > £), while the 
encoder based on the parity polynomial is simpler when k <n ~ k (k < 2«). 
which corresponds to low rate codes ( R , < j). 

Cyclic Hamming Codes The class of cyclic codes include the Hamming 
codes, which have a block length n - 2"' — 1 and n - k = m parity check bits, 
where m is any positive integer. The cyclic Hamming codes are equivalent to 
the Hamming codes described in Section 8-1-2. 

Cyclic (23, 12) Golay Code The linear (23, 12) Golay code described in 
Section 8-1-2 can be generated as a cyclic code by means of the generator 
polynomial 

g(p) = p u + p* + p 1 + p" + p* + p + 1 (8-1-40) 

The code words have a minimum distance d min = 7. 

Maximum-Length Shift-Register Codes Maximum-length shift-register 
codes are a class of cyclic codes with 

(«,*) = (r-l,m) (8-1-41 ) 

where m is a positive integer. The code words are usually generated by means 
of an m -stage digital shift register with feedback, based on the parity 
polynomial. For each code word to be transmitted, the m information bits are 
loaded into the shift register, and the switch is thrown from position 1 to 
position 2. The contents of the shift register are shifted to the left one bit at a 
time for a total of 2"' - 1 shifts. This operation generates a systematic code 
with the desired output length n=2"’-l. For example, the code words 
generated by the m =3 stage shift register in Fig. 8-1-7 are listed in Table 
8-1-4. 

Note that, with the exception of the all-zero code word, all the code words 
generated by the shift register are different cyclic shifts of a single code word. 
The reason for this structure is easily seen from the state diagram of the shift 
register, which is illustrated in Fig. 8-1-8 for m =3. When the shift register is 
loaded initially and shifted 2 m - 1 times, it will cycle through all possible 2"' - 1 
states. Hence, the shift resgister is back to its original state in 2"' - 1 shifts. 


Three-stage (m - 3) shift register with 
feedback. 
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TABLE 8-1-4 


FIGURE 8-1-8 


MAXIMUM-LENGTH SHIFT-REGISTER CODE FOR m = 3 


Information bits Code words 


0 

(J 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

I 

0 

0 

1 

) 

1 

0 

1 

0 

I 

0 

0 

1 

0 

0 

1 

1 

1 

0 

1 

1 

0 

1 

1 

1 

0 

1 

0 

1 

0 

0 

1 

0 

0 

1 

1 

1 

0 

1 

0 

1 

1 

0 

1 

0 

0 

1 

1 

1 

1 

0 

1 

1 

0 

1 

0 

0 

1 

1 

1 

1 

1 

1 

1 

0 

1 

0 

0 


Consequently, the output sequence is periodic with length n ~2 m - 1. Since 
there are 2" 1 - 1 possible states, this length corresponds to the largest possible 
period. This explains why the 2"' - 1 code words are different cyclic shifts of a 
single code word. 

Maximum-length shift-register codes exist for any positive value of m. 


The seven states for the m = 3 maximum length shift 
register. 
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TABLE 8-1-5 SHIFT-REGISTER CONNECTIONS FOR GENERATING MAXIMUM-LENGTH 
SEQUENCES 


m 

Stages connected 
to modulo-2 adder 

m 

Stages connected 
to modulo-2 adder 

m 

Stages connected 
to modulo-2 adder 

2 

1.2 

13 

1. 10. 11, 13 

24 

1. 18, 23.24 

3 

1.3 

14 

1,5, 9, 14 

25 

1.23 

4 

1,4 

15 

1, 15 

26 

1.21.25, 26 

5 

1,4 

16 

1,5, 14. 16 

27 

1.23.26. 27 

6 

1.6 

17 

1.15 

28 

1.26 

7 

1.7 

18 

1. 12 

29 

1.28 

8 

1.5. 6. 7 

19 

1,15,18,19 

30 

1.8. 29.30 

9 

1.6 

20 

1.18 

31 

1.29 

10 

1,8 

21 

1.20 

32 

1. 11,31,32 

11 

1, 10 

22 

1,22 

33 

1.21 

12 

1.7. 9, 12 

23 

1. 19 

34 

1,8, 33,34 


Source: Forney ()970l. 


Table 8-1-5 lists the stages connected to the modulo-2 adder that result in a 
maximum -length shift register for 2 m ^ 34. 

Another characteristic of the code words in a maximum-length shift-register 
code is that each code word, with the exception of the all-zero code word, 
contains 2'" ones and 2 m 1 zeros. Hence all these code words have identical 
weights, namely, w = 2 ml . Since the code is linear, this weight is also the 
minimum distance of the code, i.e., 


Finally, note that the (7, 3) maximum-length shift-register code shown in 
Table 8-1-4 is identical to the (7, 3) code given in Table 8-1-3, which is the dual 
of the (7, 4) Hamming code given in Table 8-1-2. This is not a coincidence. The 
maximum-length shift-register codes are the dual codes of the cyclic Hamming 
(2” — l,2 m -l -m) codes. 

The shift register for generating the maximum-length code may also be used 
to generate a periodic binary sequence with period n = 2 m - 1. The binary 
periodic sequence exhibits a periodic autocorrelation <A(m) with values 
<f>(m) = n for m- 0, ±n, ±2n , . . . , and <f>(m) = -l for all other shifts as 
described in Section 13-2-4. This impulse-like autocorrelation implies that the 
power spectrum is nearly white and, hence, the sequence resembles white 
noise. As a consequence, maximum-length sequences are called pseudo-noise 
(PN) sequences and find use in the scrambling of data and in the generation of 
spread spectrum signals. 


Bose-Chaudhuri— Hocquenghem (BCH) Codes BCH codes comprise a 
large class of cyclic codes that include both binary and nonbinaiy alphabets 



436 DIGITAL COMMUNICATIONS 


Binary BCH codes may be constructed with parameters 

n = 2 m - 1 

n-k^nu (8-1*42) 

d mm = 2r + 1 

where m (m^ 3) and r are arbitrary positive integers. Hence, this class of 
binary codes provides the communications system designer with a large 
selection of block lengths and code rates. Nonbinary BCH codes include the 
powerful Reed-Solomon codes that are described later. 

The generator polynomials for BCH codes can be constructed from factors 
of p Jm ~ 1 + 1. Table 8-1-6 lists the coefficients of generator polynomials for 
BCH codes of block lengths 7 =£ n 255, corresponding to 3 *£ m == 8. The 
coefficients are given in octal form, with the left-most digit corresponding to 
the highest-degree term of the generator polynomial. Thus, the coefficients of 
the generator polynomial for the (15,5) code are 2467, which in binary form is 
10100110111. Consequently, the generator polynomial is g(p) = p i0 + p* + 

p % + p 4 + p 2 + p + 1 . 

A more extensive list of generator polynomials for BCH codes is given by 
Peterson and Weldon (1972), who tabulate the polynomial factors of p 2 ~'~' 4- 1 
for m =£ 34. 


8-1-4 Optimum Soft-Decision Decoding of Linear Block 
Codes 


In this subsection, we derive the performance of linear binary block codes on 
an AWGN channel when optimum (unquantized) soft-decision decoding is 
employed at the receiver. The bits of a code word may be transmitted by any 
one of the binary signaling methods described in Chapter 5. For our purposes, 
we consider binary (or quaternary) coherent PSK, which is the most efficient 
method, and binary orthogonal FSK either with coherent detection or 
noncoherent detection. 

Let % denote the transmitted signal energy per code word and let % denote 
the signal energy required to transmit a single element (bit) in the code word. 
Since there are n bits per code word, and since each code word 

conveys k bits of information, the energy per information bit is 




* 

k 


n 

k 


% 

R ( 


(8-1-43) 


The code words are assumed to be equally likely a priori with prior probability 
MM. 

Suppose the bits of a code word are transmitted by binary PSK. Thus each 
code word results in one of M signaling waveforms. From Chapter 5, we know 
that the optimum receiver, in the sense of minimizing the average probability 



CHAPTER ft BLOCK AND CONVOLUTIONAL CHANNEL CODES 437 


TABLE *-1-6 COEFFICIENTS OF GENERATOR POLYNOMIALS (IN OCTAL FORM) FOR BCH 
CODES OF LENGTHS 7sn«255 


n 

k 

t 

gip) 

i 

4 

1 

13 

15 

11 

1 

23 


7 

2 

721 


5 

3 

2467 

31 

26 

1 

45 


21 

2 

3551 


16 

3 

107657 


11 

5 

5423325 


6 

7 

313365047 

63 

57 

1 

103 


51 

2 

12471 


45 

3 

1701317 


39 

4 

166623567 


36 

5 

1033506423 


30 

6 

157464165547 


24 

7 

17323260404441 


18 

10 

1363026512351725 


16 

11 

6331141367235453 


10 

13 

472622305527250155 


7 

15 

5231045543503271737 

127 

120 

1 

211 


113 

2 

41567 


106 

3 

11554743 


99 

4 

3447023271 


92 

5 

624730022327 


85 

6 

130704476322273 


78 

7 

26230002166130115 


71 

9 

6255010713253127753 


64 

10 

1206534025570773100045 


57 

11 

3352652525057050535 17721 


50 

13 

544465 12523314012421501421 


43 

14 

17721772213651227521 220574343 


36 

15 

3146074666522075044764574721735 


29 

21 

403114461 36767060366753014 1176155 


22 

23 

1 23376070404722522435445626637647043 


15 

27 

220570424456045547705230137622 1 7604353 


8 

31 

704726405275 1 03065 1 476224271567733 130217 

255 

247 

1 

435 


239 

2 

267543 


231 

3 

156720665 


223 

4 

75626641375 


215 

5 

23157564726421 


207 

6 

16176560567636227 


199 

7 

7633031270420722341 


191 

8 

26634701761 15333714567 


187 

9 

52755313540001322236351 


179 

10 

22624710717340432416300455 


171 

11 

1541621421 234235607706163067 
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TABLE 8-1-6 (Continued) 


n 

k 

t 

*(P) 


163 

12 

75004 1551 007560255 1 5747245 14601 


155 

13 

3757513005407665015722506464677633 


147 

14 

1642130173537165525304165305441011711 


139 

15 

461401732060175561570722730247453567445 


131 

18 

2157133314715101512612502774421420241 

65471 


123 

19 

1 206 1 4052242066003 7172103265161412262 
72506267 


115 

21 

6052666557210024726363640460027635255 

6313472737 


107 

22 

222057723220662563 1 241730023534742017 
6574750154441 


99 

23 

10656667253473 1 7422274 1 4 1 620157433225 
2411076432303431 


91 

25 

675026503032744417272363 17247325 11075 
550762720724344561 


87 

26 

1101367634147432364352316343071720462 

06722545273311721317 


79 

27 

6670003563765 75000202703442073661 7462 
1015326711766541342355 


71 

29 

2402471052064432 151 55541721 123311 6320 
5444250362557643221706035 


63 

30 

10754475055 1 635443253152 1 735770700366 
611 1726455267613656702543301 


55 

31 

7315425203501100133015275306032054325 
41 43267550 10557044426035473617 


47 

42 

2533542017062646563033041377406233175 
1 23334 1 45446045005066024552543173 


45 

43 

1 5202056055234 161131101 34637642370 1 56 
3670024470762373033202 1 5702505 1 54 1 


37 

45 

5136330255067007414177447245437530420 

735706174323432347644354737403044003 


29 

47 

3025715536673071465527064012361377115 

34224232420117411406025475741040356 

5037 


21 

55 

1 2562 1525706033265600 1 7731536076 1 2 1 03 
22734 1 40565307454252 115312161 44665 1 
3473725 


13 

59 

4641732005052564544426573714250066004 
33067744547656 1 403 1 746772 1 357026134 
460500547 


9 

63 

1 57260252 1 747246320103 1043255355 1 346 1 
41623672120440745451 127661 1 554770 55 
61677516057 


Source: Stenbit (1964), © 1964 IEEE. 
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of a code word error, for the AWGN channel, can be realized as a parallel 
bank of M filters matched to the M possible transmitted waveforms. The 
outputs of the M matched filters at the end of each signaling interval, which 
encompasses the transmission of n bits in the code word, are compared and the 
code word corresponding to the largest matched filter output is selected. 
Alternatively, M cross-correlators can be employed. In either case, the receiver 
implementation can be simplified. That is, an equivalent optimum receiver can 
be realized by use of a single filter (or cross-correlator) matched to the binary 
PSK waveform used to transmit each bit in the code word, followed by a 
decoder that forms the M decision variables corresponding to the M code 
words. 

To be specific, let r r j = 1, 2 n, represent the n sampled outputs of the 

matched filter for any particular code word. Since the signaling is binary 
coherent PSK, the output r t may be expressed either as 

r^VWc + n, (8-1-44) 

when the yth bit of a code word is a 1, or as 

r f =-VT c + n, ( 8 - 1 - 45 ) 

when the ;th bit is a 0. The variables {n,} represent additive white gaussian 
noise at the sampling instants. Each n t has zero mean and variance ^/V 0 . From 
knowledge of the M possible transmitted code words and upon reception of 
M, the optimum decoder forms the M correlation metrics 

n 

CM, ~ C(r, C,) = 2) (2c„ - 1)/}, r = l,2,...,Af (8-1-46) 

h ~- 1 

where c„ denotes the bit in the ;'th position of the ith code word. Thus, if 
c ij ~ 1 . the weighting factor 2c, j — 1 = 1 , and if c t] = 0 , the weighting factor 
2c 0 - - 1 = -1. In this manner, the weighting 2c, , - 1 aligns the signal com- 
ponents in {/•} such that the correlation metric corresponding to the actual 
transmitted code word will have a mean value while the other M - 1 

metrics will have smaller mean values. 

Although the computations involved in forming the correlation metrics for 
soft-decision decoding according to (8-1-46) are relatively simple, it may still be 
impractical to compute (8-1-46) for all the possible code words when the 
number of code words is large, e.g., M> 2 10 In such a case it is still possible to 
implement soft-decision decoding using algorithms which employ techniques 
for discarding improbable code words without computing their entire correla- 
tion metrics as given by (8-1-46). Several different types of soft-decision 
decoding algorithms have been described in the technical literature. The 
interested reader is referred to the papers by Forney (1966b), Weldon (1971). 
Chase (1972), Wainberg and Wolf (1973), Wolf (1978), and Matis and 
Modestino (1982). 

In determining the probability of error for a linear block code, note that 
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when such a code is employed on a binary-input, symmetric channel such as 
the AWGN channel with optimum soft-decision decoding, the error probability 
for the transmission of the mth code word is the same for all m. Hence, we 
assume for simplicity that the all-zero code word C, is transmitted. For correct 
decoding of C|, the correlation metric CA/, must exceed all the other M~ 1 

correlation metrics CM m , m =2 M. All the CM are gaussian distributed. 

The mean value of CA/, is ^W c n, while the mean values of CM,„, m = 2 M 

is \ / %rt(l - 2w„Jn). The variance of each decision variable is jN,,. The 
derivation of the exact expression for the probability of correct decoding or, 
equivalently, the probability of a code word error is complicated by the 
correlations among the M correlation metrics. The cross-correlation 
coefficients between C, and the other M - 1 code words are 

P„ = 1 - 2wjn, m =2, . . . , M (8-1-47) 

where w m denotes the weight of the m th code word. 

Instead of attempting to derive the exact error probability, we resort to a 
union bound. The probability that CM m > C’M, is 

Pi(m) = Q(yJ~(l- p m )j (8-1-48) 

where % ~ k% is the transmitted energy per waveform. Substitution for p m 
from (8-1-47) and for % yields 

= 0(V2^/? ( iv„,) (8-1-49) 

where j;, is the SNR per bit and R c is the code rate. Then the average 
probability of a code word error is bounded from above by the sum of the 
binary error events given by (8-1-49). Thus, 

M 

P \ f ^ 2 Pzi'n ) 

m-2 

M 

55 E G(V2y„K t .»v“) (8-1-50) 

m = 2 

The computation of the probability of error for soft-decision decoding 
according to (8-1-50) requires knowledge of the weight distribution of the 
code. Weight distributions of many codes are given in a number of texts on 
coding theory, e.g., Berlekamp (1968) and MacWilliams and Sloane (1977). 

A somewhat looser bound is obtained by noting that 

Q{V2y h R,w m ) ^ Q(\'2y h R,d m J < exp (-y b R c d mia ) (8-1-51) 

Consequently, 

Pu^(M - \)Q(V2y h R c d~~) < exp (-y h R c d min + k in 2) (8-1-52) 
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This bound is particularly useful since it does not require knowledge of the 
weight distribution of the code. When the upper bound in (8-1-52) is compared 
with the performance of an uncoded binary PSK system, which is upper- 
bounded as iexpf-y,,), we find that coding yields a gain of approximately 
ll)log(/?,d niin - k In 2 /y h ) dB. We may call this the coding gain. We note that 
its value depends on the code parameters and also on the SNR per bit y h . 

The expression for the probability of error for equicorrelated waveforms 
that can be obtained for the simplex signals described in Section 5-2 gives us 
yet a third approximation to the error probabilities for coded waveforms. We 
know that the maximum cross-correlation coefficient between a pair of coded 
waveforms is 


P max 



( 8 - 1 - 53 ) 


If we assume as a worst case that all the M code words have a cross-correlation 
coefficient equal to p lllx then the code word error probability can easily be 
manipulated. Since some code words are separated by more than the minimum 
distance, the error probability evaluated for p r = p m .^ is actually an upper 
bound. Thus, 







dv 


( 8 - 1 - 54 ) 


The bounds on the performance of linear block codes given above are in 
terms of the block error or code word error probability. The evaluation of the 
equivalent bit error probability P h is much more complicated. In general, when 
a block error is made, some of the k information bits in the block will be 
correct and some will be in error. For orthogonal waveforms, the conversion 
factor that multiplies P M to yield P h is 2*~‘/(2* - 1). This factor is unity for 
^ - 1 and approaches j as k increases, which is equivalent to assuming that, on 
the average, half of the k bits will be in error when a block error occurs. The 
conversion factor for coded waveforms depends in a complicated way on the 
distance properties of the code, but is certainly no worse than assuming that, 
on the average, half of the k bits will be in error when a block error occurs. 
Consequently, P b =£ \P M . 

The bounds on performance given by (8-1-50), (8-1-52), and (8-1-54) also 
apply to the case in which a pair of bits of a code word are transmitted by 
quaternary PSK, since quaternary PSK may be viewed as being equivalent to 
two independent binary PSK waveforms transmitted in phase quadrature. 
Furthermore, the bounds in (8-1-52) and (8-1-54), which depend only on the 
minimum distance of the code, apply also to nonlinear binary block codes. 

If binary orthogonal FSK is used to transmit each bit of a code word on the 
AWGN channel, the optimum receiver can be realized by means of two 
matched filters, one matched to the frequency corresponding to a transmission 
of a 0, and the other to the frequency corresponding to a transmission of a 1, 
followed by a decoder that forms the M correlation metrics corresponding to 
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the M possible code words. The detection at the receiver may be coherent or 
noncoherent. In either case, let r 0j and r u denote the input samples to the 
combiner. The correlation metrics formed by the decoder may be expressed as 

n 

CM, = X I c i/i i + 0 - 1 = 1,2 M (8-1-55) 


where c„ represents the y'th bit in the /th code word. The code word 
corresponding to the largest of the {CM,} is selected as the transmitted code 
word. 

It the detection of the binary FSK waveforms is coherent, the random 
variables {r„,} and {r ; } are gaussian and, hence, the correlation metrics {CM,} 
are also gaussian. In this case, bounds on the performance of the code are 
easily obtained. To be specific, suppose that the all-zero code word C , is 
transmitted. Then, 


% = + n, 


/= 1,2 n 


(8-1-56) 


where the {n„}, / = 0, 1. j = 1, 2, . . . , n, are mutually statistically independent 
gaussian random variables with zero mean and variance iJV 0 . Consequents 
CM, is gaussian with mean V%,.n and variance kN 0 . On the other hand, the 
correlation metric CM n „ corresponding to the code word having weight w,„ , is 
gaussian with mean \%n( 1 - w,„fn) and variance Since the {CM , J are 

correlated, we again resort to a union bound. The correlation coefficients are 
given by 

p,„ = \-w,Jn (8-1-57) 

Hence, the probability that CM,,, > CM, is 

P 2 {m ) = Q(Cy-R ( w,„) (8-1-58) 

Comparison of this result with that given in (8-1-49) for coherent PSK reveals 
that coherent PSK requires 3 dB less SNR to achieve the same performance. 
This is not surprising in view of the fact that uncoded PSK is 3 dB better than 
binary orthogonal FSK with coherent detection. Hence, the advantage of PSK 
over FSK is maintained in the coded waveforms. We conclude, then, that the 
bounds given in (8-1-50), (8-1-52), and (8-1-54) apply to coded waveforms 
transmitted by binary orthogonal coherent FSK with y b replaced by \y h . 

If square-law detection of the binary orthogonal FSK signal is employed at 
the receiver, the performance is further degraded by the noncoherent 
combining loss, as shown in Chapter 12. Suppose again that the all-zero code 
word is transmitted. Then the correlation metrics are given by (8-1-55), where 
the input variables to the decoder are now 


r 0 ) = + N 0/ | 2 

r iy = \N y \ 2 


j = 1,2, .... n 


(8-1-59) 
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where {/V 0y j and {N, t } represent complex-valued mutually statistically indepen- 
dent gaussian random variables with zero mean and variance *V 0 . The 
correlation metric CM, is given as 

n 

CM, = 2 'b; (8-1-60) 

while the correlation metric corresponding to the code word having weight 
w',„ is statistically equivalent to the correlation metric of a code word in which 
c m/ ~ 1 for 1 and c mj = 0 for + 1 s j « n. Hence, CM m may be 

expressed as 

w, » n 

CM„, = X r Xj + X r oj (8-1-61 ) 

J~ 1 j - w„ f 4- I 

The difference between CM, and CM m is 


CM, - CM m = 2 ( r o/ - f\j) (8-1-62) 

/“i 

and the probability of error is simply the probability that CM, - CM m < 0. But 
this difference is a special case of the general quadratic form in complex-valued 
gaussian random variables considered in Chapter 12 and Appendix B. The 
expression for the probability of error in deciding between CM, and CM n , is 
(see Section 12-1-1) 

1 1 

Piim) = r^rrrexp (~ky b R c w,„) 2 K,{{y h R (Wrri )‘ (8-1-63) 

z i=0 


where, by definition. 



(8-1-64) 


The um*on bound obtained by summing P 2 (m) over l^m^M provides us with 
an upper bound on the probability of a code word error. 

As an alternative, we may use the minimum distance instead of the weight 
distribution to obtain the looser upper bound 

Af — 1 

exp(-!y fr tf c rf m in) 2 ^Xiyb^c^mxtd' (8-1-65) 

z i —0 


A measure of the noncoherent combining loss inherent in the square-law 
detection and combining of the n elementary binary FSK waveforms in a code 
word can be obtained from Fig. 12-1-1, where d mm is used in place of L. The 
loss obtained is relative to the case in which the n elementary binary FSK 
waveforms are first detected coherently and combined as in (8-1-55) and then 
the sums are square-law-detected or envelope-detected to yield the M decision 
variables. The binary error probability for the latter case is 

? 2 (m)= Jexp(-Jy fa /? ( w m ) 


( 8 - 1 - 66 ) 
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and, hence, 


Af 


Piim) 

m = 2 


If d mw is used instead of the weight distribution, the union bound for the code 
word error probability in the latter case is 

P M * - 1) exp {-\y h Kd mm ) (8-1-67) 

The channel bandwidth required to transmit the coded waveforms can be 
determined as follows. If binary PSK is used to transmit each bit in a code 
word, the required bandwidth is approximately equal to the reciprocal of the 
time interval devoted to the transmission of each bit. For an information rate 
of R bits/s, the time available to transmit k information bits and n ~ k 
redundant (parity) bits ( n total bits) is T ~ k/R. Hence, 


W = 


1 = n __R 
T/n~ k I R~ R ( 


( 8 - 1 - 68 ) 


Therefore, the bandwidth expansion factor B e for the coded waveform is 



n )_ 
k ~ R c 


(8-1-69) 


On the other hand, if binary FSK with noncoherent detection is employed for 
transmitting the bits in a code word, W = 2n/T, and, hence, the bandwidth 
expansion factor increases by approximately a factor of 2 relative to binary 
PSK. In any case, B e increases inversely with the code rate, or, equivalently, it 
increases linearly with the block size n. 

We are now in a position to compare the performance characteristics and 
bandwidth requirements of coded signaling waveforms with orthogonal signal- 
ing waveforms. A comparison of the expression for P M given in (5-2-21) for 
orthogonal waveforms and in (8-1-54) for coded waveforms with coherent PSK 
indicates that the coded waveforms result in a loss of at most 
101og(«/2rf mjn )dB relative to orthogonal waveforms having the same number 
of waveforms. On the other hand, if we compensate for the loss in SNR due to 
coding by increasing the number of code words so that coded transmission 
requires M c = 2*‘ waveforms and orthogonal signaling requires = 2* ' 
waveforms then [from the union bounds in (5-2-27) and (8-1-52)], the 
performance obtained with the two sets of signaling waveforms at high SNR is 
about equal if 

*» = 2R c d mtn 


(8-1-70) 



CHAPTER 8: BLOCK AND CONVOLUTIONAL CHANNEL CODES 445 


Under this condition, the bandwidth expansion factor for orthogonal signaling 
can be expressed as 


B 


eo 


M a 2 k " 2 2R ‘ d """ 

2 log 2 M a 2k 0 4 R c d min 


(8-1-71) 


while, for coded signaling waveforms, we have B ec = l/R c , The ratio of B ei , 
given in (8-1-71) to B ec , which is 


D ■}2R t d nm 

eo j- 

B er 4d m i a 


(8-1-72) 


provides a measure of the relative bandwidth between orthogonal signaling 
and signaling with coded coherent PSK waveforms. 

For example, suppose we use a (63,33) binary cyclic code that has a 
minimum distance d m „ — 12. The bandwidth ratio for orthogonal signaling 
relative to this code, given by (8-1-72), is 127. This is indicative of the 
bandwidth efficiency obtained through coding relative to orthogonal signaling. 


8-1-5 Hard-Decision Decoding 

The bounds given in Section 8-1-4 on the performance of coded signaling 
waveforms on the AWGN channel are based on the premise that the samples 
from the matched filter or cross correlator are not quantized. Although this 
processing yields the best performance, the basic limitation is the computa- 
tional burden of forming M correlation metrics and comparing these to obtain 
the largest. The amount of computation becomes excessive when the number 
M of code words is large. 

To reduce the computational burden, the analog samples can be quantized 
and the decoding operations are then performed digitally. In this subsection, 
we consider the extreme situation in which each sample corresponding to a 
single bit of a code word is quantized to two levels: zero and one. That is, a 
(hard) decision is made as to whether each transmitted bit in a code word is a 0 
or a 1. The resulting discrete-time channel (consisting of the modulator, the 
AWGN channel, and the demodulator) constitutes a BSC with crossover 
probability p. If coherent PSK is employed in transmitting and receiving the 
bits in each code word then 

P = Q 

= Q(^'2y h R c ) (8-1-73) 

On the other hand, if FSK is used to transmit the bits in each code word then 

p =■ Qi^nK) (8-1-74) 

for coherent detection and 



for noncoherent detection. 


p = $exp(~hM 


(8-1-75) 
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Minimum-Distance (Maximum-Likelihood) Decoding The n bits from the 
demodulator corresponding to a received code word are passed to the decoder, 
which compares the received code word with the M possible transmitted code 
words and decides in favor of the code word that is closest in Hamming 
distance (number of bit positions in which two code words differ) to the 
received code word. This minimum distance decoding rule is optimum in the 
sense that it results in a minimum probability of a code word error for the 
binary symmetric channel. 

A conceptually simple, albeit computationally inefficient, method for 
decoding is to first add (modulo 2) the received code word vector to all the M 
possible transmitted code words C, to obtain the error vectors e ( . Hence, e, 
represents the error event that must have occurred on the channel in order to 
transform the code word C, into the particular received code word. The 
number of errors in transforming C, into the received code word is just equal 
to the number of Is in e,. Thus, if we simply compute the weight of each of the 
M error vectors {e,} and decide in favor of the code word that results in the 
smallest weight error vector, we have, in effect, a realization of the minimum 
distance decoding rule. 

A more efficient method for hard-decision decoding makes use of the parity 
check matrix H. To elaborate, suppose that C OT is the transmitted code word 
and Y is the received code word at the output of the demodulator. In general, 
Y may be expressed as 

Y = C m + e 

where e denotes an arbitrary binary error vector. The product YH' yields 

YH =(C m +e)H 
= C m H' + eH' 


= eH' = S (8-1-76) 

where the ( n - fc)-dimensional vector S is called the syndrome of the error 
pattern. In other words, the vector S has components that are zero for all parity 
check equations that are satisfied and nonzero for all parity check equations 
that are not satisfied. Thus, S contains the pattern of failures in the parity 
checks. 

We emphasize that the syndrome S is a characteristic of the error pattern 
and not of the transmitted code word. Furthermore, we observe that there are 
2" possible error patterns and only 2 n ~ k syndromes. Consequently, different 
error patterns result in the same syndrome. 

Suppose we construct a decoding table in which we list all the 2* possible 
code words in the first row, beginning with the all-zero code word in the first 
(left-most) column. This all-zero code word also represents the all-zero error 
pattern. We fill in the first column by listing first all n - 1 error patterns {«/} of 
weight 1. If n < 2 n ~ k , we may then list ail double error patterns, then all triple 
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error patterns, etc., until we have a total of 2 n ~ k entries in the first column. 
Thus, the number of rows that we can have is 2 n ~ k , which is equal to the 
number of syndromes. Next, we add each error pattern in the first column to 
the corresponding code words. Thus, we fill in the remainder of the n X (n — k) 
table as follows: 


C, 

C 2 

c, 

C 2 * 

e? 

C 2 + 

C 3 + e 2 

. . C 2 t + e 

e 3 

C 2 + e, 

C, + 

C 2 * + e, 

Ct" * 

C 2 + e 2 ,. i 

Cj + e 2 ., -* . . 

C 2 ‘ + t- 2 . 


This table is called a standard array. Each row, including the first, consists of k 
possible received code words that would result from the corresponding error 
pattern in the first column. Each row is called a coset and the first (left-most) 
code word (or error pattern) is called a coset leader. Therefore, a coset consists 
of all the possible received code words resulting from a particular error pattern 
(coset leader). 


Example 8-1-10 


Let us construct the standard array for the (5,2), systematic code with 
generator matrix given by 


10 10 1 
L 0 1 0 1 1 

Thic code has a minimum distance d mm = 3. The standard array is given in 
Table 8-1-7. Note that in this code, the coset leaders consist of the all-zero 
error pattern, five error patterns of weight 1, and two error patterns of 


TABLE 8-1-7 STANDARD ARRAY FOR THE (5.2) CODE 


Code words 


0 

0 

0 

0 

0 

0 

1 

0 

1 

1 

1 

0 

1 

0 

1 

1 

1 

1 

1 

0 

0 

0 

0 

0 

1 

0 

1 

0 

] 

0 

1 

0 

1 

0 

0 

1 

1 

1 

1 

1 

0 

0 

0 

1 

0 

0 

1 

0 

0 

1 

1 

0 

1 

1 

1 

1 

l 

1 

0 

0 

0 

0 

1 

0 

0 

0 

1 

1 

1 

1 

1 

0 

0 

0 

1 

1 

1 

0 

1 

0 

0 

1 

0 

0 

0 

0 

0 

0 

1 

1 

1 

1 

1 

0 

1 

1 

0 

1 

1 

0 

1 

0 

0 

0 

0 

I 

1 

0 

I 

1 

0 

0 

I 

0 

i 

0 

1 

1 

1 

0 

1 

1 

0 

0 

0 

1 

0 

0 

1 

l 

0 

1 

! 

0 

l 

0 

0 

1 

1 

0 

1 

0 

0 

1 

0 

1 

1 

0 

0 

1 

0 

0 

1 

1 

1 

0 

1 

1 

0 

0 
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weight 2. Although many more double error patterns exist, there is only 
room for two to complete the table. These were selected such that their 
corresponding syndromes are distinct from those of the single error 
patterns. 

Now, suppose that e, is a coset leader and that C m was the transmitted code 
word. Then, the error pattern e, would result in the received code word 

Y = C m + e, 

The syndrome is 

S = (C„ + e,)H' = C„H + e,H' = e,H 

Clearly, all received code words in the same coset have the same syndrome, 
since the latter depends only on the error pattern. Furthermore, each coset has 
a different syndrome. Having established this characteristic of the standard 
array, we may simply construct a syndrome decoding table in which we list the 
T k syndromes and the corresponding 2"'* coset leaders that represent the 
minimum weight error patterns. Then, given a received code vector Y, we 
compute the syndrome 

S = YH 

For the computed S, we find the corresponding (most likely) error vector, say 
e„,. This error vector is added to Y to yield the decoded word 

C„, = Y © e m 


Example 8-1-11 

Consider the (5,2) code with the standard array given in Table 8-1-7. The 
syndromes versus the most likely error patterns are given in Table 8-1-8. 
Now suppose the actual error vector on the channel is 

e = [1 0 1 0 0] 


TABLE 8-1-8 SYNDROME TABLE FOR THE 
(5,2) CODE 


Syndrome Error pattern 


0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

1 

0 

1 

0 

0 

0 

0 

1 

0 

1 

0 

0 

0 

0 

1 

0 

0 

0 

1 

1 

0 

1 

0 

0 

0 

1 

0 

1 

1 

0 

0 

0 

0 

1 

1 

0 

1 

1 

0 

0 

0 

1 

1 

1 

1 

0 

0 

1 

0 
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The syndrome computed for the error is S = [0 0 1]. Hence, the error 
determined from the table is e = [0 0 0 0 1], When e is added to Y, the 
result is a decoding error. In other words the (5, 2) code corrects all single 
errors and only two double errors, namely [1 1 0 0 0} and (1 0 0 1 0}. 


Syndrome Decoding of Cydic Codes As described above, hard-decision 
decoding of a linear block code may be accomplished by first computing the 
syndrome S = YH\ then using a table lookup to find the most probable error 
pattern e corresponding to the computed syndrome S, and, finally, adding the 
error pattern e to the received vector Y to obtain the most probable code word 
C m . When the code is cyclic, the syndrome computation may be performed by 
a shift register similar in form to that used for encoding. 

To elaborate, let us consider a systematic cyclic code and let us represent 
the received code vector Y by the polynomial Y(p). In general, Y = C + e, 
where C is the transmitted code word and e is the error vector. Hence, we have 


Y(p) = C(p) + e(p) 

= X(p)g(p) + e(p) (8-1-77) 


Now, suppose we divide Y(p) by the generator polynomial g{p). This division 
will yield 


or, equivalently. 


np) 

g(p) 


= Q(p) + 


*00 

g{p) 


Y{p) = Q(p)g(p) + R(p) 


(8-1-78) 


The remainder R(p ) is a polynomial of degree less than or equal to n - k - 1. 
If we combine (8-1-77) with (8-1-78). we obtain 


e(p) = f X{p) + Q{p)]g{p) + R(p) (8-1-79) 

This relationship illustrates that the remainder R(p) obtained from dividing 
Y(p) by g(p) depends only on the error polynomial e(p), and, hence, R{p) is 
simply the syndrome associated with the error pattern e. Therefore, 

Y (p) = Q(p)g(p) + S(p) (8-1-80) 

where S(p) is the syndrome polynomial of degree less than or equal to 
n - k - L If g(p) divides Y(p) exactly then S(p) = 0 and the received decoded 
word is C,„ = Y. 

The division of Y(p) by the generator polynomial g(p) may be carried out 
by means of a shift register which performs division as described previously. 
First the received vector Y is shifted into an ( n - k )- stage shift register as 
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illustrated in Fig. 8-1-9. Initially, all the shift-register contents are zero and the 
switch is closed in position 1. After the entire «-bit received vector has been 
shifted into the register, the contents of the n - k stages constitute the 
syndrome with the order of the bits numbered as shown in Fig. 8-1-9. These 
bits may be clocked out by throwing the switch into position 2. Given the 
syndrome from the (n - fc)-stage shift register, a table lookup may be 
performed to identify the most probable error vector. 


Example 8-1-12 

Let us consider the syndrome computation for the (7,4) cyclic Hamming 
code generated by the polynomial g(p) = p 3 + p + 1. Suppose that the 
received vector is Y = [1 0 0 1 1 0 1], This is fed into the three-stage 
register shown in Fig. 8-1-10. After seven shifts the contents of the shift 
register are 110, which corresponds to the syndrome S = [0 1 1]. The most 
probable error vector corresponding to this syndrome is e = [0 0 0 1 0 0 0] 
and, hence, 

C m = Y + e = fl 0 0 0 1 0 1] 

The information bits are 1 0 0 0. 


FIGURE 8-1-10 


Syndrome computation Cor the (7. 4) cyclic code with generator polynomial g(p) -- p' 
received vector Y = j 1 0 0 1 1 0 lj. 
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The table lookup decoding method using the syndrome is practical only 
when n - k is small, e.g., n - k < 10. This method is impractical for many 
interesting and powerful codes. For example, if n - k = 20, the table has 2 20 
(approximately 1 million) entries. Such a large amount of storage and the time 
required to locate an entry in such a large table renders the table lookup 
decoding method impractical for long codes having large numbers of check 
bits. 

More efficient and practical hard-decision decoding algorithms have been 
devised for the class of cyclic codes and, more specifically, the BCH codes. A 
description of these algorithms requires further development of computational 
methods with finite fields, which is beyond the scope of our treatment of 
coding theory. It suffices to indicate that efficient decoding algorithms exist 
which make it possible to implement long BCH codes with high redundancy in 
practical digital communications systems. The interested reader is referred to 
the texts of Peterson and Weldon (1972). Lin and Costello (1983), Blahut 
(1983), and Berlekamp (1968), and to the paper by Forney (1965). 


Error Detection and Error Correction Capability It is clear from the 
discussion above that when the syndrome consists of all zeros, the received 
code word is one of the 2* possible transmitted code words. Since the minimum 
separation between a pair of code words is d mi „' it is possible for an error 
pattern of weight d mi „ to transform one of these 2 k code words in the code into 
another code word. When this happens we have an undetected error. On the 
other hand, if the actual number of errors is less than d min , the syndrome will 
have a nonzero weight. When this occurs, we have detected the presence of 
one or more errors on the channel. Clearly, the (n, k) block code is capable of 
detecting d m[n — 1 errors. Error detection may be used in conjunction with an 
automatic repeat-request (ARQ) scheme for retransmission of the code word. 

The error correction capability of a code also depends on the minimum 
distance. However, the number of correctable error patterns is limited by the 
number of possible syndromes or coset leaders in the standard array. To 
determine the error correction capability of an (n, k ) code, it is convenient to 
view the 2* code words as points in an n -dimensional space. If each code word 
is viewed as the center of a sphere of radius (Hamming distance) t, the largest 
value that t may have without intersection (or tangency) of any pair of the 2* 
spheres is t = U(d mjn ~ 1)! where lx] denotes the largest integer contained in 
x. Within each sphere lie all the possible received code words of distance less 
than or equal to t from the valid code word. Consequently, any received code 
vector that falls within a sphere is decoded into the valid code word at the 
center of the sphere. This implies that an (n, k) eode with minimum distance 
d min is capable of correcting i = L£(d roin - 1)J errors. Figure 8-1-11 is a 
two-dimensional representation of the code words and the spheres. 

As described above, a code may be used to detect d min - 1 errors or to 
correct t = U(d min - 1)J errors. Clearly, to correct t error implies that we have 
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FIGURE 8- Ml 



detected / errors. However, it is also possible to detect more than t errors if we 
compromise in the error correction capability of the code. For example, a code 
with d min ~ 7 can correct t = 3 errors. If we wish to detect four errors, we can 
do so by reducing the radius of the sphere around each code word from 3 to 2. 
Thus, patterns with four errors are detectable but only patterns of two errors 
are correctable. In other words, when only two errors occur, these are 
corrected, and when three or four errors occur, the receiver may ask for a 
retransmission. If more than four errors occur, they will go undetected if the 
code word falls within a sphere of radius 2. Similarly, for d mi „ - 7, five errors 
can be detected and one error corrected. In general, a code with minimum 
distance d min can detect e d errors and correct e c errors, where 

"F C c ^ d m j n ~ 1 

and 


e c ^e d 


Probability of Error Based on Error Correction We conclude this section 
with the derivation of the probability of error for hard-decision decoding of 
linear binary block codes based on error correction only. 

From the above discussion, it is clear that the optimum decoder for a binary 
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symmetric channel will decode correctly if (but not necessarily only if) the 
number of errors in a code word is less than half the minimum distance d m[ „ of 
the code. That is, any number of errors up to 

f ~ Lk^min — 1)J 

are always correctable. Since the binary symmetric channel is memoryless, the 
bit errors occur independently. Hence, the probability of m errors in a block of 
n bits is 

P(m, n) — ( n \p m (l ~ PT m (8-1-81) 

\m/ 

and, therefore, the probability of a code word error is upper-bounded by the 
expression 

n 

Pm^ 2 P(m,n) (8-1-82) 

m-r + 1 

Equality holds in (8-1-82) if the linear block code is a perfect code. In order 
to describe the basic characteristics of a perfect code, suppose we place a 
sphere of radius t around each of the possible transmitted code words. Each 
sphere around a code word contains the set of all code words of Hamming 
distance less than or equal to t from the code word. .Now, the number of code 
words in a sphere of radius r = Lj(i/ min — 1)J is 



Since there are M = 2* possible transmitted code words, there are 2 k 
nonoverlapping spheres each having a radius /. The total number of code 
words enclosed in the 2 k spheres cannot exceed the 2" possible received code 
words. Thus, a f-error correcting code must satisfy the inequality 



or, equivalently, 

2 n ~ k ^'Z( n \ (8-1-83) 

,=o ' i > 

A perfect code has the property that all spheres of Hamming distance 
t = U(dmin _ 1)J around the M -2 k possible transmitted code words are 
disjoint and every received code word falls in one of the spheres. Thus, every 
received code word is at most, at distance t from one of the possible transmitted 
code words and (8-1-83) holds with equality. For such a code, all error 
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patterns of weight less than or equal to t are corrected by the optimum 
(minimum distance) decoder. On the other hand, any error pattern of weight 
t + 1 or greater cannot be corrected. Consequently, the expression for the error 
probability given in (8-1-82) holds with equality.The Golay (23, 12) code, 
having d mi „ = 7 and / = 3, is a perfect code. The Hamming codes, which have 
the parameters n = 2 n ~ k - 1 , d msn = 3, and t = 1 , are also perfect codes. These 
two nontrivial codes and the trivial code consisting of two code words of odd 
length n and d min = n are the only perfect binary block codes. These codes are 
optimum on the BSC in the sense that they result in a minimum error 
probability among all codes having the same block length and the same 
number of information bits. 

The optimality property defined above also holds for quasiperfect codes. A 
quasiperfect code is characterized by the property that all spheres of Hamming 
radius t arOund the M possible transmitted code words are disjoint and every 
received code word is at most at distance t + 1 from one of the possible 
transmitted code words. For such a code, all error patterns of weight less than 
or equal to t and some error patterns of weight t + 1 are correctable, but any 
error pattern of weight t + 2 or greater leads to incorrect decoding of the code 
word. Clearly, (8-1-82) is an upper bound on the error probability and 

n 

2 (8-1-84) 

m*»f+2 

is a lower bound. 

A more precise measure of the performance for quasiperfect codes can be 
obtained by making use of the inequality in (8-1-83). That is, the total number 
of code words outside the 2* spheres of radius t is 

i-O ' 1 ' 

If these code words are equally subdivided into 2* sets and each set is 
associated with one of the 2* spheres then each sphere is enlarged by the 
addition of 

/3, + i=2 (8-1-85) 

i =0 ' i ' 

code words having distance t + 1 from the transmitted code word. Conse- 
quently, of the ( f " j ) error patterns of distance t + 1 from each code word, 

we can correct /3, +1 error patterns. Thus, the error probability for decoding the 
quasiperfect code may be expressed as 

Pm = n) + J - /3, +1 ]p' +l (l - p) n ~'~ l (8-1-86) 

There are many known quasiperfect codes, although they do not exist for 



CHAPTER 8: BLOCK AND CONVOLUTIONAL CHANNEL CODES 455 


all choices of n and k. Since such codes are optimum for the binary symmetric 
channel, any («, k) linear block code must have an error probability that is at 
least as large as (8-1-86). Consequently, (8-1-86) is a lower bound on the 
probability of error for any ( n , k) linear block code, where t is the largest 
integer such that . , s* 0. 

Another pair of upper and lower bounds is obtained by considering two 
code words that differ by the minimum distance. First, we note that P M cannot 
be less than the probability of erroneously decoding the transmitted code word 
as its nearest neighbor, which is at distance d min from the transmitted code 
word. That is, 

Pm* f (8-1-87) 

tn - [rf mln ! 2]+ I ' W ' 

On the other hand, P M cannot be greater than M - 1 times the probability of 
erroneously decoding the transmitted code word as its nearest neighbor, which 
is at distance d min from the transmitted code word. That is a union bound, 
which is expressed as 

/ j , 

£ ( mm )p ,n (\-pY- m (8-1-88) 

tn — [(/nW2| + I ' m ' 

When M is large, the lower bound in (8-1-87) and the upper bound in (8-1-88) 
are very Ipose. 

A tight upper bound on P M can be obtained by applying the Chernoff bound 
presented earlier in Section 2-1-6. We assume again that the all -zero code was 
transmitted. In comparing the received code word to the all-zero code word 
and to a code word of weight w m , the probability of a decoding error, obtained 
from the Chernoff bound (Problem 8-22), is upper-bounded by the expression 

F;(w„,)« [4p(l - p)]"-' 2 (8-1-89) 

The union of these binary decisions yields the upper bound 

^«I[4p(l -p)) w " n (8-1-90) 

A simpler version of (8-1-90) is obtained if we employ d mjn in place of the 
weight distribution. That is. 

Pm (M - 1)[4/>(1 (8-1-91) 

Of course (8-1-90) is a tighter upper bound than (8-1-91). 

In Section 8-1-6. we compare the various bounds given above for a specific 
code, namely, the Golay (23, 12) code. In addition, we compare the error rate 
performance of hard-decision and soft-decision decoding. 
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8-1-6 Comparison of Performance between Hard-Decision 
and Soft-Decision Decoding 

It is both interesting and instructive to compare the bounds on the error rate 
performance of linear block codes for soft-decision decoding and hard-decision 
decoding on an AWGN channel. For illustrative purposes, we shall use the 
Golay (23, 12) code, which has the relatively simple weight distribution given 
in Table 8-1-1. As stated previously, this code has a minimum distance 

^min 7 . 

First we compute and compare the bounds on the error probability for 
hard-decision decoding. Since the Golay (23, 12) code is a perfect code, the 
exact error probability for hard-decision decoding is 

2 PV-o -p) n - 

m=4 m ' 

= 1 - 2 ( 23 V(1 -p) 23 ~ m (8-1-92) 

m=Q '/ft ' 

where p is the probability of a binary digit error for the binary symmetric 
channel. Binary (or four-phase) coherent PSK is assumed to be the 
modulation/demodulation technique for the transmission and reception of the 
binary digits contained in each code word. Thus, the appropriate expression for 
p is given by (8-1-73). In addition to the exact error probability given by 
(8-1-92), we have the lower bound given by (8-1-87) and the three upper 
bounds given by (8-1-88), (8-1-90), and (8-1-91). 

Numerical results obtained from these bounds are compared with the exact 
error probability in Fig. 8-1-12. We observe that the lower bound is very loose. 


FIGURE 8-1-12 Comparison of bounds wilh exact error probability for 
hard-decision decoding of Golay (23, 12) code. 
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FIGURE 8-1-13 Comparison of soft-decision decoding with hard-decision 
decoding for the Golay (23, 12) code 
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At P M = 10“ 5 , the lower bound is off by approximately 2dB from the exact 
error probability. At P M - 10 " 2 , the difference increases to approximately 
4dB. Of the three upper bounds, the one given by (8-1-88) is the tightest: it 
differs by less than 1 dB from the exact error probability at P M - 10 5 . The 
Chernoff bound in (8-1-90), which employs the weight distribution, is also 
relatively tight. Finally, the Chernoff bound that employs only the minimum 
distance of the code is the poorest of the three. At P M = 10 \ it differs from 
the exact error probability by approximately 2 dB. All three upper bounds are 
very loose for error rates above P M ~ 10~ 2 . 

It is also interesting to compare the performance between soft- and 
hard-decision decoding. For this comparison, we use the upper bounds on the 
error probability for soft-decision decoding given by (8-1-52) and the exact 
error probability for hard-decision decoding given by (8-3-92). Figure 8-1-13 
illustrates these performance characteristics. We observe that the two bounds 
for soft-decision decoding differ by approximately 0.5 dB at = 10 "and by 
approximately 1 dB at P M = 10 2 . We also observe that the difference in 
performance between hard- and soft-decision decoding is approximately 2 dB 
in the range 10 2 < P M < 10 In the range P M > 10 2 , the curve of the error 
probability for hard-decision decoding crosses the curves for the bounds. This 
behavior indicates that the bounds for soft-decision decoding are loose when 
Pm > 10 ' 2 . 

The 2 dB difference between hard- and soft-decision decoding is a charac- 
teristic that applies not only to the Golay code, but is a fundamental result that 
applies in general to coded digital communications over the AWGN channel. 
This result is derived below by computing the capacity of the AWGN channel 
with hard- and soft-decision decoding. 
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FIGURE 8-1-14 Code rate as a function of the minimum SNR per bit for 
soft- and hard-decision decoding. 
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The channel capacity of the BSC in bits per code symbol, derived in Section 
7-1-2, is 

C = 1 +plog 2 /> + (1 -p)iog 2 (l -p) (8-1-93) 

where the probability of a bit error for binary, coherent PSK on an AWGN 
channel is given by (8-1-73). Suppose we use (8-1-73) for p, let C = R t in 
(8-1-93), and then determine the value of y h that satisfies this equation. The 
result is shown in Fig. 8-1-14 as a graph of R c versus y h . For example, suppose 
that we are interested in using a code with rate R c = For this code rate, note 
that the minimum SNR per bit required to achieve capacity with hard-decision 
decoding is approximately 1.6 dB. 

What is the limit on the minimum SNR as the code rate approaches zero? 
For small values of R v , the probability p can be approximated as 

P ~ 2 - VybRJ* (8-1-94) 

When the expression for p is substituted into (8-1-93) and the logarithms in 
(8-1-93) are approximated by 

log 2 (l +x) = (x - 2 x 2 )/ln 2 
the channel capacity formula reduces to 


C = 


2 

;rln2 


y b Rc 


(8-1-95) 


Now we set C - R c . Thus, in the limit as R c approaches zero, we obtain the 
result 

y h = 2 ?r In 2 (0.37 dB) (8-1-96) 

The capacity of the binary-input AWGN channel with soft-decision decod- 
ing can be computed in a similar manner. The expression for the capacity in 
bits per code symbol, derived in Section 7-1-2, is 

C = ^ 2 [ Ply | *)log / ( ; ^p- dy 

p\y) 


(8-1-97) 
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where p(y\k), k = 0 , 1 , denote the probability density functions of the 
demodulator output conditioned on the transmitted bit being a 0 and a 1 , 
respectively. For the AWGN channel, we have 

p(y | k) = k = 0, 1 (8-1-98) 

where m„= -V¥ c , m l =VW c , (t 2 = \N 0 , and % = R c % b . The unconditional 
probability density p(y) is simply one-half of the sum of p(y 1 ) and p(y | 0 ). 
As R c approaches zero, the expression (8-1-97) for the channel capacity can be 
approximated as 

C y h RJ\n 2 (8-1-99) 

Again, we set C = R c . Thus, as R c ~* 0, the minimum SNR per bit to achieve 
capacity is 

y h = \u2 (-1.6 dB) (8-1-100) 

By using (8-1-98) in (8-1-97) and setting C = R c , a numerical solution can be 
obtained for code rates in the range 0 =£ R c « 1. The result of this solution is 
also shown in Fig. 8-1 -14. 

From the above, we observe that in the limit as R c approaches zero, the 
difference in SNR y b between hard- and soft -decision decoding is \it, which is 
approximately 2dB. On the other hand, as R c increases toward unity, the 
difference in y b between these two decoding techniques decreases. For 
example, at R c = 0.8, the difference is about 1.5 dB. 

The curves in Fig. 8-1-14 provide more information than just the difference 
in performance between soft- and hard-decision decoding. These curves also 
specify the minimum SNR per bit that is required for a given code rate. For 
example, a code rate of R c = 0.8 can provide arbitrarily small error probability 
at an SNR per bit of 2 dB, when soft-decision decoding is used. By comparison, 
an uncoded binary PSK requires 9.6 dB to achieve an error probability of 10 “ 5 
Hence, a 7.6 dB gain is possible by employing a rate R c = 5 code. Unfortun- 
ately, to achieve such a large coding gain usually implies the use of an 
extremely long block length code, which leads to a very complex receiver. 
Nevertheless, the curves in Fig. 8-1-14 provide a benchmark for comparing the 
coding gains achieved by practically implementable codes with the ultimate 
limits for either soft- or hard-decision decoding. 

Instead of comparing the difference between hard- and soft-decision 
decoding based on the channel capacity relations, we may perform similar 
comparisons based on the random coding rate parameters. In Chapter 7, we 
demonstrated that the ensemble average probability of error for randomly 
selected binary code words is upper-bounded as 

P e < ) ( 8 - 1 - 101 ) 

where R c =k/n is the code rate and the cutoff rate R 0 represents the upper 
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FIGURE 8-1-15 


bound on R c such that P c ->0 as n-*x. For unquantized (soft -decision) 
decoding, R„ is given as 

*(> = ‘og2— " 7 , ^ , (8-1-102) 

where %/N„ ~ R c y b is the SNR per dimension. This result was derived in 
Section 7-2. 

On the other hand, if the output of the demodulator is quantized to Q levels 
prior to decoding, the Chernoff bound may be used to upper-bound the 
ensemble average binary error probability P 2 (s h s„,) defined in Section 7-2. The 
result of this derivation is the same upper bound on P r given in (8-1-101) but 
with /?<> replaced by R Q , where 

r o-i 
R q = max -log 2 

W y — o 

In (8-1-103), {pj} are the prior probabilities of the two signals at the input to 
the channel and {/’(/' | ;')} denote the transition probabilities of the channel. For 
example, in the case of a binary symmetric channel, we have p, = p„ = 5, 
P(0 | 0) - F(1 1 1) = 1 - p, and F(0 | 1) = P(1 | 0) = p. Hence, 

fi °- l0fe rTv4(i - P ) °= 2 < 8 -'' 104 ) 

where 

P=Q(^2y h R c ) (8-1-105) 

A plot of R q versus 10 log {%/N a ) is illustrated in Fig. 8-1-15 for Q =2 and 
Q = 30 (soft-decision decoding). Note that the difference in decoder perfor- 
mance between unquantized soft-decision decoding and hard-decision decod- 
ing is approximately 2dB. In fact, it is easily demonstrated again that as 
%IN {) —>0, the loss in performance due to hard-decision decoding is 


SP>V7>(/|/)T} (8-1-103) 


Comparison of R u (soft-decision decoding) with R^ (hard- 
decision decoding) as a function of the SNR per dimension. 
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iOIog u) 2 ?T'= 2 dB, which is the same decibel difference that was obtained in 
our comparison of the channel capacity relations. We mention that about 1 dB 
of this loss can be recovered by quantizing the output of the demodulator to 
three levels instead of two (see Problem 7-11). Additional improvements are 
possible by quantizing the output into more than three levels, as shown in 
Section 7-3. 


8-1-7 Bounds on Minimum Distance of Linear Block Codes 

The expressions for the probability of error derived in this chapter for 
soft-decision and hard-decision decoding of linear binary block codes clearly 
indicate the importance that the minimum distance parameter plays in the 
performance of the code. If we consider soft-decision decoding, for example, 
the upper bound on the error probability given by (8-1-52) indicates that, for a 
given code rate R c = kin, the probability of error in an AWGN channel 
decreases exponentially with d min . When this bound is used in conjunction with 
the lower bound on d mir given below, we obtain an upper bound on P M that 
can be achieved by many known codes. Similarly, we may use the upper bound 
given by (8-1-82) for the probability of error for hard-decision decoding in 
conjunction with the lower bound on d min to obtain an upper bound on the 
error probability for linear binary block codes on the binary symmetric 
channel. 

On the other hand, an upper bound on d m , n can be used to determine a 
lower bound on the probability of error achieved by the best code. For 
example, suppose that hard-decision decoding is employed. In this case, we 
have the two lower bounds on P M given by (8-1-86) and (8-1-87), with the 
former being the tighter. When either one of these two bounds is used in 
conjunction with an upper bound on d mri the result is a lower bound on P M for 
the best (n, k ) code. Thus, upper and lower bounds on d min are important in 
assessing the capabilities of codes. 

A simple upper bound on the minimum distance of an ( n , k) binary or 
non-binary linear block code was given in (8-1-14) as d min ^n -k + 1. It is 
convenient to normalize this expression by the block size n. That is. 

^min ,, r, \ 1 

*£(1 -/? t ) + - (8-1-106) 

n n 

where R c is the code rate. For large n, the factor l In can be neglected. 

If a code has the largest possible distance, i.e., d mm = n - k + 1, it is called a 
maximum-distance-separable code. Except for the trivial repetition-type codes, 
there are no binary maximum-separable codes. In fact, the upper bound in 
(8-1-106) is extremely loose for binary codes. On the other hand, nonbinary 
codes with d mjn = n — k + 1 do exist. For example, the Reed-Solomon codes, 
which comprise a subclass of BCH codes, are maximum-distance-separable. 

In addition to the upper bound given above, there are several relatively 
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tight bounds on the minimum distance of linear block codes. We shall briefly 
describe four important bounds, three of which are upper bounds and the 
other a lower bound. The derivations of these bounds are lengthy and are not 
of particular interest in our subsequent discussion. The interested reader may 
refer to Chapter 4 of the book by Peterson and Weldon (1972) for those 
derivations. 

One upper bound on the minimum distance can be obtained from the 
inequality in (8-1-83). By taking the logarithm of both sides of (8-1-83) and 
dividing by n, we obtain 

1 -/?,>- log; i; (") (8-1-107) 

n J.rO \ i > 

Since the error-correcting capability of the code, measured by t, is related to 
the minimum distance, the above relation is an upper bound on the minimum 
distance. It is catted the Hamming upper bound. 

The asymptotic form of (8-1-107) is obtained by letting n—*oc. Now, for any 
n. let /„ be the largest integer t for which (8-1-107) holds. Then, it can be shown 
(Peterson and Weldon, 1972) that as n—>*, the ratio t/n for any (n, k) block 
code cannot exceed tjn, where tjn satisfies the equation 

1 - R c = H(tjn) (8-1-108) 

and H(x) is the binary entropy function defined by (3-2-10). 

The generalization of the Hamming bound to nonbinary codes is simply 

i-/c>J-iogJi; ("W-iyl (8-M09) 

n i-i=Q ' / / 

Another upper bound, developed by Plotkin (1960), may be stated as 
follows. The number of check digits required to achieve a minimum distance 
dm,, in an (rt, k ) linear block code satisfies the inequality 

~ 1 ~ log,, d min (8-1-110) 

*1 A 

where q is the alphabet size. When the code is binary, (8-1-110) may be 
expressed as 




In the limit as n — * with d m Jn =£ (8-1-110) reduces to 


rfmin/* « 2(1 ~ K) 


(8-1-111) 
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Finally, there is another tight upper bound on the minimum distance 
obtained by Elias (Berlekamp. 1%8). It may be expressed in its asymptotic 
form as 

d m Jn^2A(l - A) (8-1-112) 

where the parameter A is related to the code rate through the equation 

R t .= 1 + A log 2 A + | [1 — A) log, (1 -A), (8-1-113) 

Lower bounds on the minimum distance of («, A) block codes also exist. In 
particular, binary block codes exist that have a normalized minimum distance 
that asymptotically satisfies the inequality 

d m Jn>a (8-1-114) 

where a is related to the code rate through the equation 


R = 1 - H(a) 


- 1 + a log ? a + (1 - a) log 2 (1 - cr), (8-1-115) 

This lower bound is a special case of a lower bound developed by Gilbert 
(1952) and Varsharmov (1957), w'hich applies to nonbinary and binary block 
codes. 

The asymptotic bounds given above are plotted in Fig. 8-1-16 for binary 
codes. Also plotted in the figure for purposes of comparison are curves of the 
minimum distance as a function of code rate for BCH codes of block lengths 
n =31 and 63. We observe that for ri =31 and 63, the normalized minimum 
distance falls well above the Varsharmov-Gilbert lower bound. As the block 
length n increases, the efficiency of the BCFI codes diminishes. For example, 
when n = 1023, the curve for the normalized minimum distance falls close to 


FIGURE 8-1-16 


Upper and lower bounds on normalized minimum 
distance as a function of code rate. 
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the Varsharmov-Gilbert bound. As n increases beyond n = 1023. the normal- 
ized minimum distance of the BCH codes continues to decrease and falls below 
the Varsharmov-Gilbert bound. That is, d min /n approaches zero as n tends to 
infinity. Consequently the BCH codes, which are the most important class of 
cyclic codes, are not very efficient at large block lengths. 


8-1-8 Nonbinary Block Codes and Concatenated Block 
Codes 

A nonbinary block code consists of a set of fixed-length code words in which 
the elements of the code words are selected from an alphabet of q symbols, 

denoted by {0, 1, 2 q - 1}. Usually, q = 2\ so that k information bits are 

mapped into one of the q symbols. The length of the nonbinary code word is 
denoted by N and the number of information symbols encoded into a block of 
J V symbols is denoted by K. The minimum distance of the nonbinary code is 
denoted by D m - m . A systematic (/V, K) block code consists of K information 
symbols and N - K parity check symbols. 

Among the various types of nonbinary linear block codes, the Reed- 
Solomon codes are some of the most important for practical applications. As 
indicated previously, they comprise a subset of the BCH codes, which in turn 
are a class of cyclic codes. These codes are described by the parameters 

N=q- I = 2 X — I 
/C= 1,2,3, ..., V- 1 

(8-1-116) 

Ani- n = N-K + 1 
R c = K/N 

Such a code is guaranteed to correct up to 

1)J 

= U(A/-.K)J (8-1-117) 

symbol errors. Of course, these codes may be extended or shortened in the 
manner described previously for binary block codes. 

The weight distribution {A,} of the class of Reed-Solomon codes is known. 
The coefficients in the weight enumerating polynomial are given as 

Ai = ( N .)(q-l) 2 i^D mm (8-1-118) 

v * ■ /=« v / / 

where D = D min and q = 2 k . 

One reason for the importance of the Reed-Solomon codes is their good 
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distance properties. A second reason for their importance is the existence of 
efficient hard-decision decoding algorithms, which make it possible to imple- 
ment relatively long codes in many practical applications where coding is 
desirable. 

A nonbinary code is particularly matched to an A/-ary modulation technique 
for transmitting the 2 k possible symbols. Specifically, A/ary orthogonal 
signaling, e.g., Mary FSK, is frequently used. Each of the 2* symbols in the 
q - ary alphabet is mapped to one of the M = 2 k orthogonal signals. Thus, the 
transmission of a code word is accomplished by transmitting N orthogonal 
signals, where each signal is selected from the set of M = 2* possible signals. 

The optimum demodulator for such a signal corrupted by AWGN consists 
of M matched filters (or cross-correlators) whose outputs are passed to the 
decoder, either in the form of soft decisions or in the form of hard decisions. If 
hard decisions are made by the demodulator, the symbol error probability 
and the code parameters are sufficient to characterize the performance of the 
decoder. In fact, the modulator, the AWGN channel, and the demodulator 
form an equivalent discrete (A/-ary) input, discrete (A/-ary) output, symmetric 
memoryless channel characterized by the transition probabilities P c ~ 1 - P K , 
and P„/(M - 1). This channel model, which is illustrated in Fig. 8-1-17, is a 
generalization of the BSC. 

The performance of the hard-decision decoder may be characterized by the 
following upper bound on the code word error probability: 

£ ( N )P'„(1- P„)"-' (8-1-119) 

where t is the number of errors guaranteed to be corrected by the code. 

When a code word error is made, the corresponding symbol error 
probability is 

_ l ^ (N\ 

^>=~ Z ' . - P M ) ‘ (8-1-120) 

' V I =M I ' 1 1 


FIGURE 8-1-17 M- ary input. M -ary output, symmetric memoryless 
channel. 
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FIGURE 8-1-18 


Furthermore, if the symbols are converted to binary digits, the hit error 
probability corresponding to (8-1-120) is 



(8-M21) 


Example 8-1-13 

Let us evaluate the performance of an N = 2 s — 1 = 31 Reed-Solomon code 
with £>m in = 3, 5, 9, and 17. The corresponding values of K are 29, 27, 23, 
and 15. The modulation is M - 32 orthogonal FSK with noncoherent 
detection at the receiver. 

The probability of a symbol error is given by (5-4-46), and may be 
expressed as. 

1 i M\ 

^ = -c Y X(- 1)" (8-1-122) 

where y is the SNR per code symbol. By using (8-1-122) in (8-1-120) and 
combining the result with (8-1-121), we obtain the bit error probability. The 
results of these computations are plotted in Fig. 84-18. Note that the more 
powerful codes (large D mw ) give poorer performance at low SNR per bit 
than the weaker codes. On the other hand, at high SNR, the more powerful 
codes give better perfoimance. Hence, there are crossovers among the 
various codes, as illustrated for example in Fig. 8-1-18 for the t - 1 and t = 8 
codes. Crossovers also occur among the t-l, 2. and 4 codes at smaller 
values of SNR per bit. Similarly, the curves for t = 4 and 8 and for t = 8 and 
2 cross in the region of high SNR. This is the characteristic behavior lor 
noncoherent detection of the coded waveforms. 


It the demodulator does not make a hard decision on each symbol, but. 


Performance of several \ = 31. /-error correcting Reed-Solomon 
codes with 32-ary FSK modulation on an AWGN channel 
(noncoherent demodulation). 
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data 

FIGURE 8-1-19 Block diagram of a communications system employing a concatenated code. 


instead, passes the unquantized matched filter outputs to the decoder, 
soft-decision decoding can be performed. This decoding involves the formation 
of q K = 2 kK correlation metrics, where each metric corresponds to one of the 
q K code words and consists of a sum of N matched filter outputs corresponding 
to the N code symbols. The matched filter outputs may be added coherently, or 
they may be envelope-detected and then added, or they may be square-law 
detected and then added. If coherent detection is used and the channel noise is 
AWGN. the computation of the probability of error is a straightforward 
extension of the binary case considered in Section 8-1-4. On the other hand, 
when envelope detection or square-law detection and noncoherent combining 
are used to form the decision variables, the computation of the decoder 
performance is considerably more complicated. 

Concatenated Block Codes A concatenated code consists of two separate 
codes which are combined to form a larger code. Usually one code is selected 
to be nonbinarv and the other is binary. The two codes are concatenated as 
illustrated in Fig. 8-1-19. The nonbinary (N, K) code forms the outer code and 
the binary code forms the inner code. Code words are formed by subdividing a 
block of kK information bits into K groups, called symbols, where each symbol 
consists of k bits. The K k-bit symbols are encoded into N A: -bit symbols by the 
outer encoder, as is usually done with a nonbinary code. The inner encoder 
takes each Ac-bit symbol and encodes it into a binary block code of length n. 
Thus we obtain a concatenated block code having a block length of A >i bits and 
containing kK information bits. That is, we have created an equivalent 
(Nn, Kk) long binary code. The bits in each code word are transmitted over 
the channel by means of PSK or, perhaps, by FSK. 

We also indicate that the minimum distance of til. concatenated code is 
where Z) min is the minimum distance of the outer code and d nun is the 
minimum distance of the inner code. Furthermore, the rate of the concaten- 
ated code is Kk/Nn, which is equal to the product of the two code rates. 

A hard-decision decoder for a concatenated code is conveniently separated 
into an inner decoder and an outer decoder. The inner decoder takes the hard 
decisions on each group of n bits, corresponding to a code word of the inner 
code, and makes a decision on the k information bits based on maximum- 
likelihood (minimum-distance) decoding. These k bits represent one symbol of 










468 DKilTM. C'OMMUNJCA TIONS 


the outer code. When a block of N Ai-bit symbols are received from the inner 
decoder, the outer decoder makes a hard decision on the K Ar-bit symbols 
based on maximum-likelihood decoding. 

Soft-decision decoding is also a possible alternative with a concatenated 
code. Usually, the soft-decision decoding is performed on the inner code, if it is 
selected to have relatively few code words, i.e., if 2* is not too large. The outer 
code is usually decoded by means of hard-decision decoding, especially if the 
block length is long and there are many code words. On the other hand, there 
may be a significant gain in performance when soft-decision decoding is used 
on both the outer and inner codes, to justify the additional decoding 
complexity. This is the case in digital communications oveT fading channels, as 
we shall demonstrate in Chapter 14. 

We conclude this subsection with the following example. 


Example 8-1-14 

Suppose that the (7,4) Hamming code described in Examples 8-1-1 and 
8-1-2 is used as the inner code in a concatenated code in which the outer 
code is a Reed-Solomon code. Since k = 4, we select the length of the 
Reed-Solomon code to be N = 2 4 - 1 = 15. The number of information 
symbols K per outer code word may be selected over the range 1 *£ K « 14 
in order to achieve a desired code rate. 


8-1-9 Interleaving of Coded Data for Channels with Burst 
Errors 

Most of the well-known codes that have been devised for increasing the 
reliability in the transmission of information are effective when the errors 
caused by the channel are statistically independent. This is the case for the 
AWGN channel. However, there are channels that exhibit bursty error 
characteristics. One example is the class of channels characterized by multipath 
and fading, which is described in detail in Chapter 14. Signal fading due to 
time-variant multipath propagation often causes the signal to fall below the 
noise level, thus resulting in a large number of errors. A second example is the 
class of magnetic recording channels (tape or disk) in which defects in the 
recording media result in clusters of errors. Such error clusters are not usually 
corrected by codes that are optimally designed for statistically independent 
errors. 

Considerable work has been done on the construction of codes that are 
capable of correcting burst errors. Probably the best known burst error 
correcting codes are the subclass of cyclic codes called Fire codes, named after 
P. Fire (1959), who discovered them. Another class of cyclic codes for burst 
error correction were subsequently discovered by Burton (1969). 
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FIGURE 81-20 


FIGURE 8-1-21 



Block diagram of system employing inlerleaving for burst-error channel. 


A burst of errors of length b is defined as a sequence of 6-bit errors, the first 
and last of which are l’s. The burst error correction capability of a code is 
defined as one less than the length of the shortest uncorrectable burst. It is 
relatively easy to show that a systematic ( n , k) code, which has n - k parity 
check bits, can correct bursts of length b _ fc)J. 

An effective method for dealing with burst error channels is to interleave 
the coded data in such a way that the bursty channel is transformed into a 
channel having independent errors. Thus, a code designed for independent 
channel errors (short bursts) is used. 

A block diagram of a system that employs interleaving is shown in Fig. 
8-1-20. The encoded data are reordered by the interleaver and transmitted 
over the channel. At the receiver, after (either hard- or soft-decision) 
demodulation, the deinterleaver puts the data in proper sequence and passes it 
to the decoder. As a result of the interleaving/deinterleaving, error bursts are 
spread out in time so that errors within a code word appear to be independent. 

The interleaver can take one of two forms: a block structure or a 
convolutional structure. A block interleaver formats the encoded data in a 
rectangular array of m rows and n columns. Usually, each row of the array 
constitutes a code word of length n. An interleaver of degree m consists of m 
rows ( m code words) as illustrated in Fig. 8-1-21. The bits are read out 


A block interleaver for coded data. 
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column-wise and transmitted over the channel. At the receiver, the deinter- 
leaver stores the data in the same rectangular array format, but it is read out 
row-wise, one code word at a time. As a result of this reordering of the data 
during transmission, a burst of errors of length l = mb is broken up into m 
bursts of length b. Thus, an ( n,k ) code that can handle burst errors of length 
b =sLj(u -&)J can be combined with an interleaver of degree m to create an 
interleaved (mn, mk) block code that can handle bursts of length mb. 

A convolutional interleaver can be used in place of a block interleaver in 
much the same way. Convolutional interleavers are better matched for use 
with the class of convolutional codes that is described in the following section. 
Convolutional interleaver structures have been described by Ramsey (1970) 
and Forney (1971). 


8-2 CONVOLUTIONAL CODES 

A convolutional code is generated by passing the information sequence to be 
transmitted through a linear finite-state shift register. In general, the shift 
register consists of K (Ar-bit) stages and n linear algebraic function generators, 
as shown in Fig. 8-2-1. The input data to the encoder, which is assumed to be 
binary, is shifted into and along the shift register k bits at a time. The number 
of output bits for each /c-bit input sequence is n bits. Consequently, the code 
rate is defined as R c = k/n, consistent with the definition of the code rate for a 
block code. The parameter K is called the constraint length of the convolu- 
tional code.f 


FIGURE 8-2-1 



t In many cases, the constraint length of the code is given in bits rather than fc-bit bytes. Hence 
the shift register may be called a L-stage shift register, where L = Kk. Furthermore. 1. may not be a 
multiple of k, in general. 


CHAPTER K: BLOCK AND CONVOLUTIONAL CHANNEL CODES 471 


FIGURE 8-2-2 


K = 3, A; = !, n =3 convolutional encoder. 



One method for describing a convolutional code is to give its generator 
matrix, just as we did for block codes. In general, the generator matrix for a 
convolutional code is semi-infinite since the input sequence is semi-infinite in 
length. As an alternative to specifying the generator matrix, we shall use a 
functionally equivalent representation in which we specify a set of n vectors, 
one vector for each of the n modulo-2 adders. Each vector has Kk dimensions 
and contains the connections of the encoder to that moduio-2 adder. A 1 in the 
rth position of the vector indicates that the corresponding stage in the shift 
register is connected to the modulo-2 adder and a 0 in a given position 
indicates that no connection exists between that stage and the modulo-2 adder. 

To be specific, let us consider the binary convolutional encoder with 
constraint length K = 3, k = 1, and n = 3, which is shown in Fig. 8-2-2. Initially, 
the shift register is assumed to be in the all-zero state. Suppose the first input 
bit is a 1. Then the output sequence of 3 bits is 111. Suppose the second bit is a 
0. The output sequence will then be 001. If the third bit is a 1. the output will 
be 100, and so on. Now, suppose we number the outputs of the function 
generators that generate each three-bit output sequence as 1, 2, and 3, from 
top to bottom, and similarly number each corresponding function generator. 
Then, since only the first stage is connected to the first function generator (no 
modulo-2 adder is needed), the generator is 


8 : = [ 100 ] 


The second function generator is connected to stages 1 and 3. Hence 


Finally, 


g2 = [101] 


ft* = [HI] 


The generators for this code are more conveniently given in octal form as 
(4,5,7). We conclude that, when k= 1 we require n generators, each of 
dimension K to specify the encoder. 

For a rate kin binary convolutional code with k > 1 and constraint length K, 
the n generators are K k -dimensional vectors, as stated above. The following 
example illustrates the case in which k = 2 and n = 3. 
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FIGURE 8-2-3 


K = 2, k = 2 , n — 3 convolutional encoder. 



Output 


Example 8*2-1 

Consider the rate 2/3 convoiutionai encoder illustrated in Fig. 8-2-3. In this 
encoder, two bits at a time are shifted into it and three output bits are 
generated. The generators are 

g, = [1011], gz = [HOI], g? = (1010] 

In octal form, these generators are (13, 15, 12). 

There are three alternative methods that are often used to describe a 
convolutional code. These are the tree diagram, the treilis diagram, and the 
state diagram. For example, the tree diagram for the convolutional encoder 
shown in Fig. 8-2-2 is illustrated in Fig. 8-2-4. Assuming that the encoder is in 
the all-zero state initially, the diagram shows that, if the first input bit is a 0, 
the output sequence is 000 and, if the first bit is a 1; the output sequence is 111. 
Now, if the first input bit is a 1 and the second bit is a 0, the second set of three 
output bits is 001. Continuing through the tree, we see that if the third bit is a 



FIGURE £-2-4 Tree diagram for rate 1/3, K = 3 convolutional code. 
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FIGURE 8-2-5 


0 then the output is Oil, while if the third bit is a 1 then the output is 100. 
Given that a particular sequence has taken us to a particular node in the tree, 
the branching rule is to follow the upper branch if the next input bit is a 0 and 
the lower branch if the bit is a 1. Thus, we trace a particular path through the 
tree that is determined by the input sequence. 

Close observation of the tree that is generated by the convolutional encoder 
shown in Fig. 8-2-2 reveals that the structure repeats itself after the third stage. 
This behavior is consistent with the fact that the constraint length K - 3. That 
is, the three-bit output sequence at each stage is determined by the input bit 
and the two previous input bits, i.e., the two bits contained in the first two 
stages of the shift register. The bit in the last stage of the shift register is shifted 
out at the right and does not affect the output. Thus we may say that the 
three-bit output sequence for each input bit is determined by the input bit and 
the four possible states of the shift register, denoted as a = 00, b =01, c = 10, 
d- 11. If we label each node in the tree to correspond to the four possible 
states in the shift register, we find that at the third stage there are two nodes 
with the label a, two with the label b, two with the label c, and two with the 
label d. Now we observe that all branches emanating from two nodes having 
the same label (same state) are identical in the sense that they generate 
identical output sequences. This means that the two nodes having the same 
label can be merged. If we do this to the tree shown in Fig. 8-2-4, we obtain 
another diagram, which is more compact, namely, a trellis. For example, the 
trellis diagram for the convolutional encoder of Fig. 8-2-2 is shown in Fig. 
8-2-5. In drawing this diagram, we use the convention that a solid line denotes 
the output generated by the input bit 0 and a dotted line the output generated 
by the input bit 1. In the example being considered, we observe that, after the 
initial transient, the trellis contains four nodes at each stage, corresponding to 
the four states of the shift register, a, b, c, and d. After !he second stage, each 
node in the trellis has two incoming paths and two outgoing paths. Of the two 


Trellis diagram for rate 1/3, K = 3 convolutional code. 
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FIGURE 8-2-6 


Slaie diagram for rate; 1/3, K 



outgoing paths, one corresponds to the input bit 0 and the other to the path 
followed if the input bit is a 1. 

Since the output ot the encoder is determined by the input and the state of 
the encoder, an even more compact diagram than the trellis is the state 
diagram. The state diagram is simply a graph of the possible states of the 
encoder and the possible transitions from one state to another. For example 
the state diagram for the encoder shown in Fig. 8-2-2 is illustrated in Fig. 8-2-6. 
This diagram shows that the possible transitions are 

a a, a-^c, b-*a, b-Uc, c^*b, c -U d, d-^+b, d^d, 

where o-E. {3 denotes the transition from state a to fi when the input bit is a 1. 
The three bits shown next to each branch in the state diagram represent the 
output bits. A dotted line in the graph indicates that the input bit is a l. while 
the solid line indicates that the input bit is a 0. 


Example 8-2-2 

Let us consider the k = 2, rate 2/3 convolutional code described in Example 
8-2-1 and shown in Fig. 8-2-3. The first two input bits may be 00, 01, 10, or 
11. The corresponding output bits are 000, 010, 111, 101. When the next pair 
of input bits enter the encoder, the first pair is shifted to the second stage. 
The corresponding output bits depend on the pair of bits shifted into the 
second stage and the new pair of input bits. Hence, the tree diagram for this 
code, shown in Fig. 8-2-7, has four branches per node, corresponding to the 
four possible pairs of input symbols. Since the constraint length of the code 
is K =2, the tree begins to repeat after the second stage. As illustrated in 
Fig. 8-2-7, all the branches emanating from nodes labeled a (state a) yield 
identical outputs. By merging the nodes having identical labels, we obtain 
the trellis, which is shown in Fig. 8-2-8. Finally, the state diagram for this 
code is shown in Fig. 8-2-9. 
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To generalize, we state that a rate kin, constraint length K. convolutional 
code is characterized by 2 k branches emanating from each node of the tree 
diagram. The trellis and the state diagrams each have 2 k{K " possible states. 
There are 2* branches entering each state and 2 V branches leaving each state- 
fin the trellis and tree, this is true after the initial transient). 

The three types of diagrams described above are also used to represent 
nonbinary convolutional codes. When the number of symbols in the code 
alphabet is ry =2*, k> 1, the resulting nonbinary code may also be represented 
as an equivalent binary code. The following example considers a convolutional 
code of this type, 

Example 8-2-3 

Let us consider the convolutional code generated by the encoder shown in 

Fig. 8-2-10. This code may be described as a binary convolutional code with 

parameters K = 2, k = 2, n = 4, R , = 1/2, and having the generators 

8l = [1010], g. = [0101]. fc = [lllOj, g4 = [1001] 
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FIGURE 8-2-8 



Trellis diagram for K = 2, k = 2, n = 3 convolutional code. 


FIGURE 8-2-9 State diagram for K = 2, k -2, n- 3 convolutional code. 
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FIGURE 8-2-10 K = 2, k -2, « = 4 convolutional encoder. 



Except for the difference in rate, this code is similar in form to the rate 2/3, 
k =2 convolutional code considered in Example 8-2-1. 

Alternatively, th'e code generated by the encoder in Fig. 8-2-10 may be 
described as a nonbinary ( q = 4) code with one quaternary symbol as an 
input and two quaternary symbols as an output. In fact, if the output of the 
encoder is treated by the modulator and demodulator as q - ary (</=4) 
symbols that are transmitted over the channel by means of some M - ary 
(M = 4) modulation technique, the code is appropriately viewed as 
nonbinary. 

In any case, the tree, the trellis, and the state diagrams are independent 
of how we view the code. That is, this particular code is characterized by a 
tree with four branches emanating from each node, or a trellis with four 
possible states and four branches entering and leaving each statp or, 
equivalently, by a state diagram having the same parameters as the trellis. 


8-2-1 The Transfer Function of a Convolutional Code 

The distance properties and the error rate performance of a convolutional code 
can be obtained from its state diagram. Since a convolutional code is linear, the 
set of Hamming distances of the code sequences generated up to some stage in 
the tree, from the all-zero code sequence, is the same as the set of distances of 
the code sequences with respect to any other code sequence. Consequently, we 
assume without loss of generality that the all-zero code sequence is the input to 
the encoder. 

The state diagram show'n in Fig. 8-2-6 will be used to demonstrate the 
method for obtaining the distance properties of a convolutional code. First, we 
label the branches of the state diagram as either D° = 1, D\ D 2 , or D 3 , where 
the exponent of D denotes the Hamming distance of the sequence of output 
bits corresponding to each branch from the sequence of output bits corres- 
ponding to the all-zero branch. The self-loop at node a can be eliminated, since 
it contributes nothing to the distance properties of a code sequence relative to 
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FIGURE 8-2-11 



State diagram for rate 1/3, K = 3 convolutional code. 


the all-zero code sequence. Furthermore, node a is split into two nodes, one of 
which represents the input and the other the output of the state diagram. 
Figure 8-2-11 illustrates the resulting diagram. We use this diagram, which now 
consists of five nodes because node a was split into two, to write the four state 
equations 

X c = D 2 X a + DX h 
X h = DX C + DX d 

( 8 - 2 - 1 ) 

X„ = D l X c + D 2 X d 
X e = D 2 X h 


The transfer function for the code is defined as T(D) = XjX a . By solving 
the state equations given above, we obtain 


T(D) = 


D 6 

1 - 2D 2 


= D 6 + 2D 8 + 4D 10 + 8D 12 + . . . 


where, by definition, 


= E a d D J 


d = 6 


(2 id w (even d) 
\ 0 (odd d) 


( 8 - 2 - 2 ) 


(8-2-3) 


The transfer function for this code indicates that there is a single path of 
Hamming distance d - 6 from the all-zero path that merges with the alt-zero 
path at a given node. From the state diagram shown in Fig. 8-2-6 or the trellis 
diagram shown in Fig. 8-2-5, it is observed that the d = 6 path is acbe. There is 
no other path from node a to node e having a distance d = 6. The second term 
in (8-2-2) indicates that there are two paths from node a to node e having a 
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FIGURE 8-2-12 



Slate diagram for rate 1/3, K = 3 convolutional code. 


distance d = 8. Again, from the state diagram or the trellis, we observe that 
these paths are acdbe and acbcbe. The third term in (8-2-2) indicates that there 
are four paths of distance d = 10, and so forth. Thus the transfer function gives 
us the distance properties of the convolutional code. The minimum distance of 
the code is called the minimum free distance and denoted by d free . In our 
example, d free — 6. 

The transfer function can be used to provide more detailed information than 
just the distance of the various paths. Suppose we introduce a factor N into all 
branch transitions caused by the input bit 1 . Thus, as each branch is traversed, 
the cumulative exponent on N increases by one only if that branch transition is 
due to an input bit 1. Furthermore, we introduce a factor of J into each branch 
of the state diagram so that the exponent of J will serve as a counting variable 
to indicate the number of branches in any given path from node a to node e. 
For the rate 1/3 convolutional code in our example, the state diagram that 
incorporates the additional factors of J and N is shown in Fig. 8-2-12. 

The state equations for the state diagram shown in Fig. 8-2-12 are 


X c = JND'X a + JNDX h 
X h = JDX C + JDX d 
X d = JND 2 X t + JND 2 X d 
X,. = JD 2 X h 


(8-2-4) 


Upon solving these equations for the ratio X e /X a , we obtain the transfer 
function 


T(D, N, J) = 


PND h 

1 - JND 2 {1 + J) 


- Pnd 6 + j 4 n 2 d* + Pn 2 d* + j s n*d 10 


+ 2J b N 3 D 10 + J 7 N*D i0 + . . . (8-2-5) 


This form for the transfer functions gives the properties of all the paths in 
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the convolutional code. That is, the first term in the expansion of T(D, N,J ) 
indicates that the distance d = 6 path is of length 3 and of the three 
information bits, one is a 1. The second and third terms in the expansion of 
T(D,N,J) indicate that of the two d = 8 terms, one is of length 4 and the 
second has length 5. Two of the four information bits in the path having length 
4 and two of the five information bits in the path having length 5 are Is. Thus, 
the exponent of the factor J indicates the length of the path that merges with 
the all-zero path for the first time, the exponent of the factor N indicates the 
number of Is in the information sequence for that path, and the exponent of D 
indicates the distance of the sequence of encoded bits for that path from the 
all-zero sequence. 

The factor J is particularly important if we are transmitting a sequence of 
finite duration, say m bits. In such a case, the convolutional code is truncated 
after m nodes or m branches. This implies that the transfer function for the 
truncated code is obtained by truncating T(D,N,J) at the term J m . On the 
other hand, if we are transmitting an extremely long sequence, i.e., essentially 
an infinite-length sequence, we may wish to suppress the dependence of 
T(D,N,J) on the parameter J. This is easily accomplished by setting 7 = 1. 
Hence, for the example given above, we have 


T(D,N,])=T(D, N) = 


ND b 

1 - 2ND 1 


= ND b + 2N 2 D 8 + 4 N 3 D 10 + . . . 


= S a d N«~* )a D d (8-2-6) 

d=t> 

where the coefficients {a d } are defined by (8-2-3). 

The procedure outlined above for determining the transfer function of a 
binary convolutional code is easily extended to nonbinary codes. In the 
following example, we determine the transfer function of the nonbinary 
convolutional code previously introduced in Example 8-2-3. 


Example 8-2-4 

The convolutional code shown in Fig. 8-2-10 has the parameters K = 2, 
k - 2, n ~ 4. In this example, we have a choice of how we label distances 
and count errors, depending on whether we treat the code as binary or 
nonbinary. Suppose we treat the code as nonbinary. Thus, the input to the 
encoder and the output are treated as quaternary symbols. In particular, if 
we treat the input and output as quaternary symbols 00, 01, 10, and 11, the 
distance measured in symbols between the sequences 0111 and 0000 is 2. 
Furthermore, suppose that an input symbol 00 is decoded as the symbol 11; 
then we have made one symbol error. This convention applied to the 
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FIGURE 8-2-13 


JND 



JND 


State diagram for K = 2, k = 2, rate 1/2 nonbinary code. 


convolutional code shown in Fig. 8-2-10 results in the state diagram 
illustrated in Fig. 8-2-13, from which we obtain the state equations 


X h - NJD 2 X„ + NJDX h + NJDX C + NJD 2 X lt 
X c = NJD 2 X a + NJD 2 X h + NJ DX C + NJDX ( , 
X lt = NJD 2 X a + NJDXf, + NJD 2 X l + NJDX,, 
AW D 2 (X„ + X ( . + X tl ) 


Solution of these equations leads to the transfer function 


T(D, N, J ) - 


3 NJ 2 D a 

1 - 2NJD - NJD 2 


(8-2-7) 


( 8 - 2 - 8 ) 


This expression for the transfer function is particularly appropriate when the 
quaternary symbols at the output of the encoder are mapped into a 
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FIGURE 8-2-14 Slate diagram tor K - 2, k - 2, rate 1/2 convolutional code with output treated as a binary 
sec) uenet*. 


corresponding set of quaternary waveforms s m (t), m = 1, 2, 3, 4, e.g., four 
orthogonal waveforms. Thus, there is a one-to-one correspondence between 
code symbols and signal waveforms. 

Alternatively, for example, the output of the encoder may be transmitted 
as a sequence of binary digits by means of binary PSK. In such a case, it is 
appropriate to measure distance in terms of bits. When this convention is 
employed, the state diagram is labeled as shown in Fig. 8-2-14. Solution of 
the state equations obtained from this state diagram yields a transfer 
function that is different from the one given in (8-2-8). 

Some convolutional codes exhibit a characteristic behavior that is called 
catastrophic error propagation. When a code that has this characteristic is used 
on a binary symmetric channel, it is possible for a finite number of channel 
errors to cause an infinite number of decoding errors. Such a code can be 
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identified from ils state diagram. It will contain a zero-distance path (a path 
with multiplier D" ~ 1) from some nonzero state back to the same state. This 
means that one can ioop around this zero-distance path an infinite number of 
times without increasing the distance relative to the all-zero path. But. if this 
self-loop corresponds to the transmission of a 1. the decoder will make an 
infinite number of errors. Since such codes are easily recognized, they are 
easily avoided in practice. 

8-2-2 Optimum Decoding of Convolutional Codes — The 

Viterbi Algorithm 

In the decoding of a block code for a memoryless channel, we computed the 
distances (Hamming distance for hard-decision decoding and euclidean dis- 
tance for soft-decision decoding) between the received code word and the 2' 
possible transmitted code words. Then we selected the code word that was 
closest in distance to the received code word. This decision rule, which requires 
the computation of 2 t metrics, is optimum in the sense that it result> in a 
minimum probability of error for the binary symmetric channel with /> ' and 

the additive w'hile gaussian noise channel. 

Unlike a block code, which has a fixed length n, a convolutional encoder is 
basically a finite-state machine. Hence the optimum decoder is a maximum- 
likelihood sequence estimator (MLSE) of the type described in Section 5-1-4 
for signals with memory, such as NRZ1 and CPM. Therefore, optimum 
decoding of a convolutional code involves a search through the trellis lor the 
most probahle sequence. Depending on whether the detector following the 
demodulator performs hard or soft decisions, the corresponding metric in the 
trellis search may be either a Hamming metric or a euclidean metric, 
respectively. We elaborate beiow, using the trellis in Fig. S-2-5 for the 
convolutional code shown in Fig. 8-2-2. 

Consider the two paths in the trellis that begin at the initial state a and 
remerge at state a after three state transitions (three branches), corresponding 
to the two information sequences 000 and 100 and the transmitted sequences 
000 000 000 and 111 001 Oil, respectively. We denote the transmitted bits by 
{c r „, j = 1. 2, 3; m - 1, 2, 3}, where the index j indicates the yth branch and the 
index in the mih bit in that branch. Correspondingly, we define j ~ 1,2, 3: 
m = 1,2, 3} as the output of the demodulator. If the detector performs 
hard-decision decoding, its output for each transmitted bit is either 0 or 1. On 
the other hand, if soft-decision decoding is employed and the coded sequence 
is transmitted by binary coherent PSK, the input to the decoder is 

r im = v 7 (2r,,„ - 1 ) + n tm (8-2-0) 

where n Jf „ represents the additive noise and is the transmitted signal energy 
for each code bit. 

A metric is defined for the yth branch of the ith path through the trellis as 
the logarithm of the joint probability of the sequence {r /m , m= 1,2.3} 
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conditioned on the transmitted sequence m = 1,2,3} for the ith path. 
That is, 

ixf = log P(Yj | Cf), / =1.2, 3,... (8-2-10) 

Furthermore, a metric for the ith path consisting of B branches through the 
trellis is defined as 

B 

PM U) = 2) (8-2-11) 

/=i 

The criterion for deciding between two paths- through the trellis is to select 
the one having the larger metric. This rule maximizes the probability of a 
correct decision or, equivalently, it minimizes the probability of error for the 
sequence ^of information bits. For example, suppose that hard-decision 
decoding is performed by the demodulator, yielding the received sequence 
{101 000 100}. Let i = 0 denote the three-branch all-zero path and / = 1 the 
second three-branch path that begins in the initial state a and remerges with 
the all-zero path at state a after three transitions. The metrics for these two 
paths are 

PAf <0) = 6 log (1 - p) + 3 log p 

,n (8-2-12) 

PAf ( ° ~ 4 log (1 - p) + 5 log p 


where p is the probability of a bit error. Assuming that p < we find that the 
metric PM (0) is larger than the metric PM U) . This result is consistent with 
the observation that the all-zero path is at Hamming distance d = 3 from the 
received sequence, while the i - 1 path is at Hamming distance d = 5 from the 
received path. Thus, the Hamming distance is an equivalent metric for 
hard-decision decoding. 

Similarly, suppose that soft-decision decoding is employed and the channel 
adds white gaussian noise to the signal. Then the demodulator output is 
described statistically by the probability density function 


P(r, 




c^)=- 

^jnt/ 


V2; K a 


exp 


{- 


[r >w -V^(2ca-l)] 

2 cr 2 


(8-2-13) 


where cr 2 = ^ N 0 is the variance of the additive gaussian noise. If we neglect the 
terms that are common to all branch metrics, the branch metric for the jth 
branch of the ith path may be expressed as 


= 2 r )m (2c% - 1) (8-2-14) 

m = 1 

where, in our example, n ~ 3. Thus the correlation metrics for the two paths 
under consideration are 

C.W«°>=i i r'.frjX- 1) 

)— 1 m = l 


= i i r„(2cSi>-l) 


7=1 m = 1 


(8-2-15) 
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Having defined the branch metrics and path metrics computed by the 
decoder, we now consider the use of the Viterbi algorithm for optimum 
decoding of the convolutionally encoded information sequence. We consider 
the two paths described above, which merge at state a after three transitions. 
Note that any particular path through the trellis that stems from this node will 
add identical terms to the path metrics CM (I>} and CM V> . Consequently, if 
CM 1 " 1 > CM 1 ' 1 at the merged node a after three transitions will continue 

to be larger than CAT" for any path that stems from node a. This means that 
the path corresponding to CM"’ can be discarded from further consideration. 
The path corresponding to the metric CAT"’ is the survivor. Similarly, one of 
the two paths that merge at state b can be elminated on the basis of the two 
corresponding metrics. This procedure is repeated at state c and state d. As a 
result, after the first three transitions, there are four surviving paths, one 
terminating at each state, and a corresponding metric for each survivor. This 
procedure is repeated at each stage of the trellis as new signals are received in 
subsequent time intervals. 

In general, when a binary convolutional code with k = 1 and constraint 
length K is decoded by means of the Viterbi algorithm, there are 2* 1 states. 
Hence, there are 2 K ~ 1 surviving paths at each stage and 2 K ~ l metrics, one for 
each surviving path. Furthermore, a binary convolutional code in which k bits 
at a time are shifted into an encoder that consists of K (/c-bit) shift-register 
stages generates a trellis that has 2* (A_1) states. Consequently, the decoding of 
such a code by means of the Viterbi algorithm requires keeping track of 2 k{K 11 
surviving paths and metrics. At each stage of the trellis, there are 2 k 

paths that merge at each node. Since each path that converges at a common 
node requires the computation of a metric, there are 2 k metrics computed for 
each node. Of the 2 k paths that merge at each node, only one survives, and this 
is the most-probable (minimum-distance) path. Thus the number of computa- 
tions in decoding performed at each stage increases exponentially with k and 
K. The exponential increase in computational burden limits the use of the 
Viterbi algorithm to relatively small values of K and k. 

The decoding delay in decoding a long information sequence that has been 
convolutionally encoded is usually too long for most practical applications. 
Moreover, the memory required to store the entire length of surviving 
sequences is large and expensive. As indicated in Section 5-1-4, a solution to 
this problem is to modify the Viterbi algorithm in a way which results in a fixed 
decoding delay without significantly affecting the optimal performance of the 
algorithm. Recall that the modification is to retain at any given time / only the 
most recent 8 decoded information bits (symbols) in each surviving sequence. 
As each new information bit (symbol) is received, a final decision is^made on 
the bit (symbol) received 8 branches back in the trellis, by comparing the 
metrics in the surviving sequences and deciding in favor of the bit in the 
sequence having the largest metric. If 8 is chosen sufficiently large, all surviving 
sequences will contain the identical decoded bit (symbol) 8 branches back in 
time. That is, with high probability, all surviving sequences at time t stem from 
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the same node at t - 8. It has been found experimentally (computer simula- 
tion) that a delay 82 = 5K results in a negligible degradation in the performance 
relative to the optimum Viterbi algorithm. 

8-2-3 Probability of Error for Soft-Decision Decoding 

The topic of this subsection is the error rate performance of the Viterbi 
algorithm on an additive white gaussian noise channel with soft-decision 
decoding. 

In deriving the probability of error for convolutional codes, the linearity 
property for this class of codes is employed to simplify the derivation. That is. 
we assume that the all-zero sequence is transmitted and we determine the 
probability of error in deciding in favor of another sequence. The coded binary 
digits for the yth branch of the convolutional code, denoted as {c jm , 
f n — 1, 2 , . . . , n} and defined in Section 8-2-2, are assumed to be transmitted by 
binary PSK (or four-phase PSK) and detected coherently at the demodulator. 
The output of the demodulator, which is the input to the Viterbi decoder, is 
the sequence { r )m , m = 1 , 2, . . . , n\ j = 1 , 2 , . . .} where r jm is defined in ( 8 - 2 - 9 ). 

The Viterbi soft-decision decoder forms the branch metrics defined by 
(8-2-14) and from these computes the path metrics 

CM in = 2 fij° = X S'*.(2c£-1) (8-2-16) 

j~ 1 j=\ m — 1 

where i denotes any one of the competing paths at each node and B is the 
number of branches (information symbols) in a path. For example, the all-zero 
path, denoted as i = 0, has a path metric 

i (-V? + «,„)(- 1) 

/ = 1 m = l 

= + 2 2 n i„ (8-2-17) 

/= 1 m = 1 

Since the convolutional code does not necessarily have a fixed length, we 
derive its performance from the probability of error for sequences that merge 
with the all-zero sequence for the first time at a given node in the trellis. In 
particular, we define the first-event error probability as the probability that 
another path that merges with the all-zero path at node B has a metric that 
exceeds the metric of the all-zero path for the first time. Suppose the incorrect 
path, call it *' = 1, that merges with the all-zero path differs from the all-zero 
path in d bits, i.e., there are d Is in the path i = 1 and the rest are Os. The 
probability of error in the pairwise comparison of the metrics CM W and CM { ' ) 
is 

P 2 (d) = P(CA/ (1) 2* CAT°>) = />(CM (1) - CM (0) s* 0) 

ivc cff-o»ol 

;=1 m = | J 


( 8 - 2 - 18 ) 
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Since the coded bits in the two paths are identical except in the d positions, 
(8-2-18) can be written in the simpler form 

P 2 (d) = p(ir;> o) (8-2-19) 

v /-i > 

where the index / runs over the set of d bits in which the two paths differ and 
the set (r/} represents the input to the decoder for these d bits. 

The {r/} are independent and identically distributed gaussian random 
variables with mean ~VW C and variance £)V 0 . Consequently the probability of 
error in the pairwise comparison of these two paths that differ in d bits is 

w = Q ('M d ) 

= Q{^2 lb R c d) (8-2-20) 

where y h = %b/No is the received SNR per bit and R c is the code rate. 

Although we have derived the first-event error probability for a path of 
distance d from the all-zero path, there are many possible paths with different 
distances that merge with the all-zero path at a given node B. In fact, the 
transfer function T(D) provides a complete description of all the possible 
paths that merge with the all-zero path at node B and their distances. Thus we 
can sum the error probability in (8-2-20) over all possible path distances. Upon 
performing this summation, we obtain an upper bound on the first-event error 
probability in the form 

P'* S a d P 2 (t i) 

d = d frcc 

X 

« 2 a,Q(V2^d) ( 8 - 2 - 21 ) 

<l = d, tcc 

where a d denotes the number of paths of distance d from the all-zero path that 
merge with the all-zero path for the first time. 

There are two reasons why (8-2-21) is an upper bound on the first-event 
error probability. One is that the events that result in the error probabilities 
{P 2 (d)} are not disjoint. This can be seen from observation of the trellis. 
Second, by summing over all possible d & d free , we have implicitly assumed that 
the convolutional code has infinite length. If the code is truncated periodically 
after B nodes, the upper bound in (8-2-21) can be improved by summing the 
error events for d fret . *£ d B. This refinement has some merit in determining 
the performance of short convolutional codes, but the effect on performance is 
negligible when B is large. 

The upper bound in (8-2-21) can be expressed in a slightly different form if 
the Q function is upper-bounded by an exponential. That is, 

Q(V2y h R c d) *£ e ' ^ - D%. e 


( 8 - 2 - 22 ) 
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If we use (8-2-22) in (8-2-21), the upper bound on the first-event error 
probability can be expressed as 

P'<T(D)\ Dme -y*. (8-2-23) 

Although the first-event error probability provides a measure of the 
performance of a convolutional code, a more useful measure of performance is 
the bit error probability. This probability can be upper-bounded by the 
procedure used in bounding the first-event error probability. Specifically, we 
know that when an incorrect path is selected, the information bits in which the 
selected path differs from the correct path will be decoded incorrectly. We also 
know that the exponents in the factor N contained in the transfer function 
T(D,N ) indicate the number of information bit errors (number of Is) in 
selecting an incorrect path that merges with the all-zero path at some node B. 
If we multiply the pairwise error probability P 2 (d) by the number of incorrectly 
decoded information bits for the incorrect path at the node where they merge, 
we obtain the bit error rate for that path. The average bit errof probability is 
upper-bounded by multiplying each pairwise error probability P 2 (d ) by the 
corresponding number of incorrectly decoded information bits, for each 
possible incorrect path that merges with the correct path at the 5th node, and 
summing over all d. 

The appropriate multiplication factors corresponding to the number of 
information bit errors for each incorrectly selected path may be obtained by 
differentiating T(D, i V) with respect to N. In general, T(D, N ) can be 
expressed as 

T(D,N)= 2 a d D d N' w (8-2-24) 

d~dlrtc 

where f(d) denotes the exponent of A as a function of d. Taking the derivative 
of T(D, N) with respect to N and setting N = 1, we obtain 

dT(D, N ) 
dN 

= 2 P d D d (8-2-25) 


= f a d f(d)D d 


where p d = a d f(d). Thus the bit error probability for k = 1 is upper-bounded 
by 

ec 

P b < 2 PdP 2 (d) 

d=d,,„ 

X 

< 2 fi<QC&y b R c d) 

d=d, m 


(8-2-26) 
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If the Q function is upper-bounded by an exponential as indicated in (8-2-22) 
then (8-2-26) can be expressed in the simple form 


h< 2 P d D“ 

d = dtn* 

dT(D, N) 


P=e “»*c 


< 


dN 


/V=*l,D = r-» s ' 


(8-2-27) 


If fc>l, the equivalent bit error probability is obtained by dividing (8-2-26) 
and (8-2-27) by k. 

The expressions for the probability of error given above are based on the 
assumption that the code bits are transmitted by binary coherent PSK. The 
results also hold for four-phase coherent PSK, since this modulation/ 
demodulation technique is equivalent to two independent (phase-quadrature) 
binary PSK systems. Other modulation and demodulation techniques, such as 
coherent and noncoherent binary FSK, can be accommodated by recomputing 
Ihe pairwise error probability P 2 (d). That is, a change in the modulation and 
demodulation technique used to transmit the coded information sequence 
affects only the computation of P 2 (d). Otherwise, the derivation for P b remains 
the same. 

Although the above derivation of the error probability for Viterbi decoding 
of a convolutional code applies to binary convolutional codes, it is relatively 
easy to generalize it to nonbinary convolutional codes in which each nonbinary 
symbol is mapped into a distinct waveform. In particular, the coefficients {fi d } 
in the expansion of the derivative of T(D, N), given in (8-2-25), represent the 
number of symbol errors in two paths separated in distance (measured in terms 
of symbols) by d symbols. Again, we denote the probability of error in a 
pairwise comparison of two paths that are separated in distance by d as P 2 (d). 
Then the symbol error probability, for a A:-bit symbol, is upper-bounded by 

x 

P M < 2 P«P2(d) 

d=d, m 

The symbol error probability can be converted into an equivalent bit error 
probability. For example, if 2* orthogonal waveforms are used to transmit the 
fc-bit symbols, the equivalent bit error probability is P M multiplied by a factor 
2 * 1 /( 2 * _ i) as shown in Chapter 5. 


8-2-4 Probability of Error for Hard-Decision Decoding 

We now consider the performance achieved by the Viterbi decoding algorithm 
on a binary symmetric channel. For hard-decision decoding of the convolu- 
tional code, the metrics in the Viterbi algorithm are the Hamming distances 
between the received sequence and the 2 k(K ~ u surviving sequences at each 
node of the trellis. 

As in our treatment of soft-decision decoding, we begin by determining the 
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first-event error probability. The all-zero path is assumed to be transmitted. 
Suppose that the path being compared with the all-zero path at some node B 
has distance d from the all-zero path. If d is odd, the all-zero path will be 
correctly selected if the number of errors in the received sequence is less than 
2 (<i+l); otherwise, the incorrect path will be selected. Consequently, the 
probability of selecting the incorrect path is 

P 2 (d) = £ ( d .)p k {\-py‘- k (8-2-28) 

k-(d+) )/2 '*/ 

where p is the probability of a bit error for the binary symmetric channel. If d 
is even, the incorrect path is selected when the number of errors exceeds ^d. If 
the number of errors equals \d, there is a tie between the metrics in the two 
paths, which may be resolved by randomly selecting one of the paths: thus, an 
error occurs half the time. Consequently, the probability of selecting the 
incorrect path is 

P*d)= £ ( d .)p k V-pY- k + l(?.)p d, 2 il-p) M (8-2-29) 

* x kf \\d! 

As indicated in Section 8-2-3, there are many possible paths with different 
distances that merge with the all-zero path at a given node. Therefore, there is 
no simple exact expression for the first-event error probability. However, we 
can overbound this error probability by the sum of the pairwise error 
probabilities P 2 {d) over all possible paths that merge with the all-zero path at 
the given node. Thus, we obtain the union bound 

x 

P f < X a t ,P 2 (d) (8-2-30) 

d=d Ucc 

where the coefficients {a d } represent the number of paths corresponding to the 
set of distances {d}. These coefficients are the coefficients in the expansion of 
the transfer function T(D) or T(D, N). 

Instead of using the expressions for P 2 {d) given in (8-2-28) and (8-2-29), we 
can use the upper bound 


P 2 {d)<Mp(\~p)f 2 (8-2-31) 

which was given in Section 8-1-5. Use of this bound in (8-2-30) yields a looser 
upper bound on the first-event error probability, in the form 

P, < £ a d {4p(\-p)Y' 2 


< T(D) |o^ V'4p,;i 


(8-2-32) 
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FIGURE 8-2-15 


Let us now determine the probability of a bit error. As in' the case of 
soft-decision decoding, we make use of the fact that the exponents in the 
factors of A that appear in the transfer function T(D, N ) indicate the number 
of nonzero information bits that are in error when an incorrect path is selected 
over the all-zero path. By differentiating T(D, A) with respect to N and setting 
N= t, the exponents of A become multiplication factors of the corresponding 
error-event probabilities P 2 (d). Thus, we obtain the expression for the upper 
bound on the bit error probability, in the form 


P*< 2 Mi d) (8-2-33) 

J - rff,,,; 

where the {/3,J are the coefficients in the expansion of the derivative of 
T(D,N), evaluated at N = [. For P 2 {d), we may use either the expressions 
given in (8-2-28) and (8-2-29) or the upper bound in (8-2-31). If the latter is 
used, the upper bound on P h can be expressed as 


P b < 


dT(D, A) | 
dN 


'y- \ .D = V4p(i p) 


(8-2-34) 


When k > J, the results given in (8-2-33) and (8-2-34) for P h should be divided 
by k. 

A comparison of the error probability for the rate 1/3, K ~ 3 convolutional 
code with soft-decision decoding and hard-decision decoding is made in Fig. 
8-2-15. Note that the Chernoff upper bound given by (8-2-34) is less than 1 dB 
above the tighter upper bound given by (8-2-33) in conjunction with (8-2-28) 
and (8-2-29). The advantage of the Chernoff bound is its computational 


Comparison of soft-decision and hard-decision decoding 
for K = .1, k = 1, n =3 convoiutionai code 
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simplicity. In comparing the performance between soft-decision and hard- 
decision decoding, note that the difference obtained from the upper bounds is 
approximately 2.5 dB for 10“ 6 ^ P b *s 10 2 . 

Finally, we should mention that the ensemble average error rate perfor- 
mance of a convolutional code on a discrete memoryless channel, just as in the 
case of a block code, can be expressed in terms of the cutoff rate parameter R n 
as (for the derivation, see Viterbi and Omura, 1979). 

(q-l) q - KR ^ 

h ’ n. c —I\ o 

where q is the number of channel input symbols, K is the constraint length of 
the code, R c is the code rate, and R 0 is the cutoff rate defined in Sections 7-2 
and 8-1. Therefore, conclusions reached by computing R 0 for various channel 
conditions apply to both block codes and convolutional codes. 


8-2-5 Distance Properties of Binary Convolutional Codes 

In this subsection, we shall tabulate the minimum free distance and the 
generators for several binary, short-constraint-length convolutional codes for 
several code rates. These binary codes are optimal in the sense that, for a given 
rate and a given constraint length, they have the largest possible d ftec . The 
generators and the corresponding values of d free tabulated below have been 
obtained by Odenwalder (1970), Larsen (1973), Paaske (1974), and Daut et al. 
(1982) using computer search methods. 

Heller (1968) has derived a relatively simple upper bound on the minimum 
free distance of a rate 1 fn convolutional code. It is given by 


‘free 


: min 

/ 5=1 




*- 2 '’ - 1 


(K + l 


- 1 )« 


(8-2-35) 


where LrJ denotes the largest integer contained in x. For purposes of 
comparison, this upper bound is also given in the tables for the rate 1/ai codes. 
For rate k/n convolutional codes, Daut et al. (1982) has given a modification to 
Heller’s bound. The values obtained from this upper bound for k/n codes are 
also tabulated. 

Tables 8-2-1 to 8-2-7 list the parameter of rate l/n convolutional codes for 
n = 2, 3, . . . , 8. Tables 8-2-8 to 8-2-11 list the parameters of several rate k/n 
convolutional codes for k 4 and n « 8. 


8-2-6 Nonbinary Dual -A: Codes and Concatenated Codes 

Our treatment of convolutional codes thus far has been focused primarily on 
binary codes. Binary codes are particularly suitable for channels in which 
binary or quaternary PSK modulation and coherent demodulation is possible. 
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TABLE 8-2-1 RATE 1/2 MAXIMUM FREE DISTANCE CODE 


Constraint 
length K 

Generators 

in octal 


Upper bound 
d,„. 

3 

5 

7 

5 

5 

4 

15 

17 

6 

6 

5 

23 

35 

7 

8 

6 

53 

75 

8 

8 

n 

i 

133 

171 

10 

10 

8 

247 

371 

10 

11 

9 

561 

753 

12 

12 

10 

1,167 

1,545 

12 

13 

11 

2,335 

3,661 

14 

14 

12 

4,335 

5,723 

15 

15 

13 

10,533 

17,661 

16 

16 

14 

21,675 

27,123 

16 

17 


Source: Odenwalder (1970) and Larsen (1973). 


However, there are many applications in which PSK modulation and coherent 
demodulation is not suitable or possible. In such cases, other modulation 
techniques, e.g., A/-ary FSK, are employed in conjunction with noncoherent 
demodulation. Nonbinary codes are particularly matched to M - ary signals that 
are demodulated noncoherently. 

In this subsection, we describe a class of nonbinary convolutional codes, 
called dual-k codes, that are easily decoded by means of the Viterbi algorithm 
using either soft-decision or hard-decision decoding. They are also suitable 
either as an outer code or as an inner code in a concatenated code, as will also 
be described below. 


TABLE 8-2-2 RATE 1/3 MAXIMUM FREE DISTANCE CODES 


Constraint 





Upper bound 

length JC 

Generators in 

octal 

dfr** 

on d rr,« 

3 

5 

7 

7 

8 

8 

4 

13 

15 

17 

10 

10 

5 

25 

33 

37 

12 

12 

6 

47 

53 

75 

13 

13 

7 

133 

145 

175 

15 

15 

8 

225 

331 

367 

16 

16 

9 

557 

663 

711 

18 

18 

10 

1,117 

1,365 

1,633 

20 

20 

11 

2,353 

2,671 

3,175 

22 

22 

12 

4,767 

5,723 

6,265 

24 

24 

13 

10,533 

10,675 

17,661 

24 

24 

14 

21,645 

35,661 

37,133 

26 

26 


Sources: Odenwalder (1970) and Larsen (1973). 
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TABLE 8-2-3 RATE 1/4 MAXIMUM FREE DISTANCE CODES 


TABLE 8-2-4 


Constraint 
length K 


Generators in octal 


dtrtr 

Upper bound 

on 

3 

5 

7 

7 

7 

10 

10 

4 

13 

15 

15 

17 

13 

15 

5 

25 

27 

33 

37 

16 

16 

6 

53 

67 

71 

75 

IS 

18 

n 

135 

135 

147 

163 

20 

20 

8 

235 

275 

313 

357 

22 

22 

9 

463 

535 

733 

745 

24 

24 

10 

1.117 

1.365 

1,633 

1.653 

27 

27 

11 

2.387 

2,353 

2,671 

3.175 

29 

29 

12 

4.767 

5,723 

6,265 

7.455 

32 

32 

13 

11.145 

12,477 

15.537 

16,727 

33 

33 

14 

21,113 

23,175 

35.527 

35.537 

36 

36 

Source: Larsen 

(1973). 






RATE 1/3 MAXIMUM 

FREE DISTANCE CODES 



Constraint 






Upper bound 

length K 


Generators in octal 


dine 

On dfrer 

3 

7 

7 

7 5 

5 

13 

13 

4 

17 

17 

13 15 

15 

16 

16 

5 

37 

27 

33 25 

35 

20 

20 

6 

^5 

71 

73 65 

57 

22 

22 

7 

1 7 5 

131 

135 135 

147 

25 

25 

8 

257 

233 

323 271 

357 

28 

28 


Source: Dau; er al. (1982). 


TABLE 8-2-5 RATE 1/6 MAXIMUM FREE DISTANCE CODES 


Constraint 





Upper bound 

length K 

Generators in 

octal 


on d tT „ 

7 

7 

7 

7 

16 

16 


7 

s 

5 



4 

17 

17 

13 

20 

20 


13 

15 

15 



s 

37 

35 

27 

24 

24 


33 

25 

35 



6 

73 

75 

55 

27 

27 


65 

47 

57 



*7 

173 

151 

135 

30 

30 


135 

163 

137 



8 

253 

3T5 

331 

34 

34 


235 

313 

357 




Source: Daul el al. (1982). 
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TABLE 8-2-6 


TABLE 8-2-7 

I 

j 

t 


TABLE 8-2-8 


RATE 1/7 MAXIMUM FREE DISTANCE CODES 


Constraint Upper bound 


length K 


Generators in octal 



on 

3 

7 

7 

7 

7 

18 

18 


5 

5 

5 




4 

17 

17 

13 

13 

23 

23 


13 

15 

15 




5 

35 

27 

25 

27 

28 

28 


33 

35 

37 




6 

53 

75 

65 

75 

32 

32 


47 

67 

57 




7 

165 

145 

173 

135 

36 

36 


135 

147 

137 




8 

275 

253 

375 

331 

40 

40 


235 

313 

357 




Source: Daut el al. (1982). 





RATE 1/8 MAXIMUM FREE 

DISTANCE CODES 



Constraint 






Upper bound 

length K 


Generators in octal 


4*. 

on 

3 

7 

7 

5 

5 

21 

21 


5 

7 

7 

7 



4 

17 

17 

13 

13 

26 

26 


13 

15 

15 

17 



5 

37 

33 

25 

25 

32 

32 


35 

33 

27 

37 



6 

57 

73 

51 

65 

36 

36 


75 

47 

67 

57 



7 

153 

111 

165 

173 

40 

40 


135 

135 

147 

137 



8 

275 

275 

253 

371 

45 

45 


331 

235 

313 

357 



Source: Daut el al. (1982), 





RATE 2/3 MAXIMUM 

FREE 

DISTANCE CODES 



Constraint 





Upper bound 

length K 

Generators in octal 

^fre* 

0» 

2 

17 

06 

15 

3 


4 

3 

27 

75 

72 

5 


6 

4 

236 

155 

337 

7 


7 


Source: Dual el al. (1982). 
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TABLE 8-2-9 


TABLE 8-2-10 


TABLE 8-2-11 


RATE A/5 MAXIMUM FREE DISTANCE CODES 


Rate 

Constraint 
length K 


Generators In octal 



Upper bound 

on 

2/5 

2 

17 

07 

11 

12 

04 

6 

6 


3 

27 

71 

52 

65 

57 

10 

10 


4 

247 

366 

171 

266 

373 

12 

12 

3/5 

2 

35 

23 

75 

61 

47 

5 

5 

4/5 

2 

237 

274 

156 

255 

337 

3 

4 

Source. 

Daut el al. (1982). 








RATE A/7 MAXIMUM FREE DISTANCE CODES 

Rate 

Constraint 
length K 


Generators 

in octal 


Upper bound 
on 

2/7 

2 

05 

06 

12 

15 

9 

9 



15 

13 

17 





3 

33 

55 

72 

47 

14 

14 



25 

53 

75 





4 

312 

125 

247 

366 

18 

18 



171 

266 

373 




3/7 

2 

45 

21 

36 

62 

8 

8 



5? 

43 

71 




4/7 

2 

130 

067 

237 

274 

6 

7 



156 

255 

337 





Source: Daut el at. (1982). 


RATES 3/4 AND 3/8 MAXIMUM FREE DISTANCE CODES 


Constraint Upper bound 

Rate length K Generators in octal r/ (Vc> on 


3/4 

2 

13 

25 

61 

47 

3/8 

2 

15 

42 

23 

61 



51 

36 

75 

47 


Source: Daut et at. ( 1982). 


A dual -k rate 1/2 convolutional encoder may be represented as shown in 
Fig. 8-2-16. It consists of two (K = 2) A: -bit shift-register stages and n^2k 
function generators. Its output is two k - bit symbols. We note that the code 
considered in Example 8-2-3 is a dual-2 convolutional code. 
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FIGURE 8-2-16 



Encoder for rate 1/2 dual-/: codes. 


The 2k function generators for the dual-* codes have been given by Viterbi 
and Jacobs (1975). These may be expressed in the form 



-* 

1 


“1 

0 

0 


0 

1 

o 

o 

. 



0" 





= 

0 

1 

0 


0 

0 

l j o 

• 



0 

= [I* 

1*1 


- 

J 


_0 

0 

0 


1 

0 

0 1 



0 

1_ 



*— g* + l — * 


' 1 

1 

0 

0 




0 

1 

0 

0 


0 

gA+2 - * 


0 

0 

1 

0 




0 

0 

l 

0 


0 


. 


0 

0 

0 

1 

0 



0 

0 

0 

1 

0 

0 




0 

0 

0 




. 0 

1 







%2k * _ 


,1 

0 

0 




. 0 

0 

0 

0 


. . . 

0 1 




' 1 

1 

0 

0 




0 









0 

0 

1 

0 




0 









0 

0 

0 

1 

0 



0 


• 

• 


I* 





0 

0 

0 




. 0 

1 









. 1 

0 

0 




. 0 

0 







(8-2-36) 

where denotes the k x k identity matrix. 

The general form for the transfer function of a rate 1/2 dual-* code has 
been derived by Odenwalder (1976). It is expressed as 


T(D . N,J) = 


(2 - 1 )D*J 2 N 
1 - NJ[2D + (2* - 3)£> J ] 

X a,D'N m J hU) 


(8-2-37) 
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where D represents the Hamming distance for the q-ary (q — 2 k ) symbols, the 
/(t) exponent on N represents the number of information symbol errors that 
are produced in selecting a branch in the tree or trellis other than a 
corresponding branch on the all-zero path, and the h(i ) exponent on J is equal 
to the number of branches in a given path. Note that the minimum free 
distance is d free = 4 symbols (4k bits). 

Lower-rate dual-A convolutional codes can be generated in a number of 
ways, the simplest of which is to repeat each symbol generated by the rate 1/2 
code r times, where r-\,2,...,m (r - 1 corresponds to each symbol 
appearing once). If each symbol in any particular branch of the tree or trellis 
or state diagram is repeated r times, the effect is to increase the distance 
parameter from D to D r . Consequently the transfer function for a rate 1/2 r 
dual'/c code is 


T(D, N, J) = 


(2 k - \)D ir J 2 N 
1 - NJ[2D r + (2* - 3)D 2r ] 


(8-2-38) 


In the transmission of long information sequences, the path length paia- 
meter J in the transfer function may be suppressed by setting 7=1. The 
resulting transfer function T(D, /V) may be differentiated with respect to N, 
and N is set to unity. This yields 


dT(D, N) I 

dN ljv=i 


( 2 * — 1 )P 4r 

[1 -2D' -(2 k -i)D 2r ] 2 


= £ P,D‘ (8-2-39) 

, ; =4 r 

where /?, represents the number of symbol errors associated with a path having 
distance D' from the all-zero path, as described previously in Section 8-2-3. 
The expression in (8-2-39) may be used to evaluate the error probability for 
dual-/: codes under various channel conditions, 


Performance of Dual-/; Codes with Af-aiy Modulation Suppose that a 
dual-fc code is used in conjunction with A/-ary orthogonal signaling at the 
modulator, where M = 2\ Each symbol from the encoder is mapped into one 
of the M possible orthogonal waveforms. The channel is assumed to add white 
gaussian noise. The demodulator consists of M matched filters. 

If the decoder performs hard-decision decoding, the performance of the 
code is determined by the symbol error probability P M . This error probability 
has been computed in Chapter 5 for both coherent and noncoherent detection. 
From P M , we can determine P 2 (d) according to (8-2-28) or (8-2-29), which is 
the probability of error in a pairwise comparison of the all-zero path with a 
path that differs in d symbols. The probability of a bit error is upper-bounded 
as 

i ® 

2 (d) (8-2-40) 

^ A rf = 4 r 

The factor 2*~ , /(2* - 1) is used to convert the symbol error probability to the 
bit error probability. 
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Instead of hard-decision decoding, suppose that the decoder performs 
soft-decision decoding using the output of a demodulator that employs a 
square-law detector. The expression for the bit error probability given by 
(8-2-40) still applies, but now P 2 (d) is given by (see Section 12-1-1) 

P2(d) = ^cxp(-h b Rcd)l K,(hM (8-2-41) 

where 



and R c = l/2r is the code rate. This expression follows from the result (8-1 -63). 

Concatenated Codes In Section 8-1-8, we considered the concatenation of 
two block codes to form a long block code. Now that we have described 
convolutional codes, we broaden our viewpoint and consider the concatenation 
of a block code with a convolutional code or the concatenation of two 
convolutional codes. 

As described previously, the outer code is usually chosen to be nonbinary, 
with each symbol selected from an alphabet of q = 2* symbols. This code may 
be a block code, such as a Reed-Solomon code, or a convolutional code, such 
as a dual-& code. The inner code may be either binary or nonbinary, and either 
a block or a convolutional code. For example, a Reed-Solomon code may be 
selected as the outer code and a dual-A: code may be selected as the inner code. 
In such a concatenation scheme, the number of symbols in the outer 
(Reed-Solomon) code q equals 2 k , so that each symbol of the outer code maps 
into a k - bit symbol of the inner dual-A: code. M - ary orthogonal signals may be 
used to transmit the symbols. 

The decoding of such concatenated codes may also take a variety of 
different forms. If the inner code is a convolutional code having a short 
constraint length, the Viterbi algorithm provides an efficient means for 
decoding, using either soft-decision or hard-decision decoding. 

If the inner code is a block code, and the decoder for this code performs 
soft-decision decoding, the outer decoder may also perform soft-decision 
decoding using as inputs the metrics corresponding to each word of the inner 
code. On the other hand, the inner decoder may make a hard decision after 
receipt of the code word and feed the hard decisions to the outer decoder. 
Then the outer decoder must perform hard-decision decoding. 

The following example describes a concatenation code in which the outer 
code is a convolutional code and the inner code is a block code. 


Example 8-2-5 

Suppose we construct a concatenated code by selecting a dual-A code as the 
outer code and a Hadamard code as the inner code. To be specific, we select 
a rate 1/2 dual-5 code and a Hadamard (16, 5) inner code. The dual-5 rate 
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1/2 code has a minimum free distance D trce = 4 and the Hadamard code has 
a minimum distance d m , n = 8. Hence, the concatenated code has an effective 
minimum distance of 32. Since there are 32 code words in the Hadamard 
code and 32 possible symbols in the outer code, in effect, each symbol from 
the outer code is mapped into one of the 32 Hadamard code words. 

The probability of a symbol error in decoding the inner code may be 
determined from the results of the performance of block codes given in 
Sections 8-1-4 and 8-1-5 for soft-decision and hard-decision decoding, 
respectively. First, suppose that hard-decision decoding is performed in the 
inner decoder with the probability of a code word (symbol of outer code) 
error denoted as P 32 , since M =32. Then the performance of the outer code 
and, hence, the performance of the concatenated code is obtained by using 
this error probability in conjunction with the transfer function for the dual-5 
code given by (8-2-37). 

On the other hand, if soft-decision decoding is used on both the outer 
and the inner codes, the soft-decision metric from each received Hadamard 
code word is passed to the Viterbi algorithm, which computes the 
accumulated metrics for the competing paths through the trellis. We shall 
give numerical results on the performance of concatenated codes of this 
type in our discussion of coding for Rayleigh fading channels. 

8-2-7 Other Decoding Algorithms for Convolutional Codes 

The Viterbi algorithm described in Section 8-2-2 is the optimum decoding 
algorithm (in the sense of maximum-likelihood decoding of the entire 
sequence) for convolutional codes. However, it requires the computation of 
2 kK metrics at each node of the trellis and the storage of 2* < * _1) metrics and 
2 A( * -1) surviving sequences, each of which may be about 5 kK bits long. The 
computational burden and the storage required to implement the Viterbi 
algorithm make it impractical for convolutional codes with large constraint 
length. 

Prior to the discovery of the optimum algorithm by Viterbi, a number of 
other algorithms had been proposed for decoding convolutional codes. The 
earliest was the sequential decoding algorithm originally proposed by Wozen- 
craft (1957, 1961), and subsequently modified by Fano (1963). 

The Fano sequential decoding algorithm searches for the most probable 
path through the tree or trellis by examining one path at a time. The increment 
added to the metric along each branch is proportional to the probability of the 
received signal for that branch, just as in Viterbi decoding, with the exception 
that an additional negative constant is added to each branch metric. The value 
of this constant is selected such that the metric for the correct path will 
increase on the average, while the metric for any incorrect path will decrease 
on the average. By comparing the metric of a candidate path with a moving 
(increasing) threshold, Fano's algorithm detects and discards incorrect paths. 
To be more specific, let us consider a memoryless channel. The metric for 
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the /th path through the tree or trellis from the first branch to branch B may be 

expressed as 

CM" = 22 

j = | = 1 

(8-2-42) 

where 




P (')».) 

(8-2-43) 

In (8-2-43), r jm 

is the demodulator output sequence, p(r jm 

| c$) denotes the pdf 


of r /m conditional on the code bit for the /nth bit of the /'th branch of the z'th 
path, and 9( is a positive constant. 3C is selected as indicated above so that the 
incorrect paths will have a decreasing metric while the correct path will have 
an increasing metric on the average. Note that the term p(r jm ) in the 
denominator is independent of the code sequence, and, hence, may be 
subsumed in the constant factor. 

The metric given by (8-2-43) is generally applicable for either hard- or 
soft-decision decoding. However, it can be considerably simplified when 
hard-decision decoding is employed. Specifically, if we have a BSC with 
transition (error) probability p, the metric for each received bit, consistent with 
the form in (8-2-43) is given by 


to = f *og 2 (2(1 - p)] ~ R c if r jm = c# 
1 log 2 2 p - R, if r Jm # c" 


(8-2-44) 


where f )m is the hard-decision output from the demodulator and cj^ is the mth 
code bit in the j th branch of the ith path in the tree and R c is the code rate. 
Note that this metric requires some (approximate) knowledge of the error 
probability. 


Example 8-2-6 

Suppose we have a rate R c = 1/3 binary convolutional code for transmitting 
information over a BSC with p = 0.1. By evaluating (8-2-44) we find that 


(o _ f 0-^2 if r jm 

>m 1-2.65 if r /m 

To simplify the computations, the metric in (8 
well approximated as 


= c (i) 

^ jm 

¥■ c (i) 


(8-2-45) 


-2-45) may be normalized. It is 


u (') = 
H" jm 


1 if r — 

1 11 r fm c ;m 

— 5 if f ^ C ^ 

^ 11 ' frrt ^ '"jm 


(8-2-46) 


Since the code rate is 1/3, there are three output bits from the encoder for 
each input bit. Hence, the branch metric consistent with (8-2-46) is 


p" = 3-6 d 
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FIGURE 8-2-17 



or, equivalently, 

= 1 -2d (8-2-47) 

where d is the Hamming distance of the three received bits from the three 
branch bits. Thus, the metric fi j 0 is simply related to the Hamming distance 
of the received bits to the code bits in the )th branch of the ith path. 

Initially, the decoder may be forced to start on the correct path by the 
transmission of a few known bits of data. Then it proceeds forward from node 
to node, taking the most probable (largest metric) branch at each node and 
increasing the threshold such that the threshold is never more than some 
preselected value, say t, below the metric. Now suppose that the additive noise 
(for soft-decision decoding) or demodulation errors resulting from noise on the 
channel (for hard-decision decoding) cause the decoder to take an incorrect 
path because it appears more probable than the correct path. This is illustrated 
in Fig. 8-2-17. Since the metrics of an incorrect path decrease on the average, 
the metric will fall below the current threshold, say r 0 . When this occurs, the 
decoder backs up and takes alternative paths through the tree or trellis, in 
order of decreasing branch metrics, in an attempt to find another path that 
exceeds the threshold t 0 . If it is successful in finding an alternative path, it 
continues along that path, always selecting the most probable branch at each 
node. On the other hand, if no path exists that exceeds the threshold r 0 , the 
threshold is reduced by an amount r and the original path is retraced. If the 
original path does not stay above the new threshold, the decoder resumes its 
backward search for other paths. This procedure is repeated, with the 
threshold reduced by r for each repetition, until the decoder finds a path that 
remains above the adjusted threshold. A simplified flow diagram of Fano’s 
algorithm is shown in Fig. 8-2-18. 

The sequential decoding algorithm requires a buffer memory in the decoder 
to store incoming demodulated data during periods when the decoder is 
searching for alternate paths. When a search terminates, the decoder must be 
capable of processing demodulated bits sufficiently fast to empty the buffer 
prior to commencing a new search. Occasionally, during extremely long 
searches, the buffer may overflow. This causes loss of data, a condition that 
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FIGURE 8-2-18 


A simplified flow diagram of 
Fano’s algorithm. [From Jordan 
{1966), © 1966 IEEE.\ 



can be remedied by retransmission of the lost information. In this regard, we 
should mention that the cutoff rate R 0 has special meaning in sequential 
decoding. It is the rate above w^ich the average number of decoding 
operations per decoded digit becomes infinite, and it is termed the 
computational cutoff rate R comp . In practice, sequential decoders usually 
operate at rates near /?„. 

The Fano sequential decoding algorithm has been successfully implemented 
in several communication systems. Its error rate performance is comparable to 
that of Viterbi decoding. However, in comparison with Viterbi decoding, 
sequential decoding has a significantly larger decoding delay. On the positive 
side, sequential decoding requires less storage than Viterbi decoding and, 
hence, it appears attractive for convolutional codes with a large constraint 
length. The issues of computational complexity and storage requirements for 
sequential decoding are interesting and have been thoroughly investigated. For 
an analysis of these topics and other characteristics of the Fano algorithm, the 
interested reader may refer to Galfager ( 1968), Wozencraft and Jacobs (1965), 
Savage (1966), and Forney (1974). 

Another type of sequential decoding algorithm, called a stack algorithm, has 
been proposed independently by Jelinek (1969) and Zigangirov (1966). In 
contrast to the Viterbi algorithm, w'hich keeps track of 2 1 * paths and 
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FIGURE 8-2-19 
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corresponding metrics, the stack sequential decoding algorithm deals with 
fewer paths and their corresponding metrics. In a stack algorithm, the more 
probable paths are ordered according to their metrics, with the path at the top 
of the stack having the largest metric. At each step of the algorithm, only the 
path at the top of the stack is extended by one branch. This yields 2* successors 
and their corresponding metrics. These 2* successors along with the other paths 
are then reordered according to the values of the metrics and all paths with 
metrics that fall below some preselected amount from the metric of the top 
path may be discarded. Then the process of extending the path with the largest 
metric is repeated. Figure 8-2-19 illustrates the first few steps in a stack 
algorithm. 

It is apparent that when none of the 2* extensions of the path with the 
largest metric remains at the top of the stack, the next step in the search 
involves the extension of another path that has climbed to the top of the stack. 
It follows that the algorithm does not necessarily advance by one branch 
through the trellis in every iteration. Consequently, some amount of storage 
must be provided for newly received signals and previously received signals in 
order to allow the algorithm to extend the search along one of the shorter 
paths, when such a path reaches the top of the stack. 
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In a comparison of the stack algorithm with the Viterbi algorithm, the stack 
algorithm requires fewer metric computations, but this computational saving is 
offset to a large extent by the computations involved in reordering the stack 
after every iteration. In comparison with the Fano algorithm, the stack 
algorithm is computationally simpler, since there is no retracing over the same 
path as is done in the Fano algorithm. On the other hand, the stack algorithm 
requires more storage than the Fano algorithm. 

A third alternative to the optimum Viterbi decoder is a method called 
feedback decoding (Heller, 1975), which has been applied to decoding for a 
BSC (hard-decision decoding). In feedback decoding, the decoder makes a 
hard decision on the information bit at stage / based on metrics computed from 
stage j to stage j + m, where m is a preselected positive integer. Thus, the 
decision on the information bit is either 0 or 1 depending on whether the 
minimum Hamming distance path that begins at stage j and ends at stage j + m 
contains a 0 or 1 in the branch emanating from stage j. Once a decision is made 
on the information bit at stage j, only that part of the tree that stems from the 
bit selected at stage j is kept (half the paths emanating from node j) and the 
remaining part is discarded. This is the feedback feature of the decoder. 

The next step is to extend the part of the tree that has survived to stage 
j + 1 + m and consider the paths from stage j + 1 to j + l + m in deciding on 
the bit at stage j + 1. Thus, this procedure is repeated at every stage. The 
parameter m is simply the number of stages in the tree that the decoder looks 
ahead before making a hard decision. Since a large value of m results in a large 
amount of storage, it is desirable to select m as small as possible. On the other 
hand, m must be sufficiently large to avoid a severe degradation in perfor- 
mance. To balance these two conflicting requirements, m is usually selected in 
the range K ^2K, where K is the constraint length. Note that this 
decoding delay is significantly smaller than the decoding delay in a Viterbi 
decoder, which is usually about 5 K. 


Example 8-2-7 

Let us consider the use of a feedback decoder for the rate 1/3 convolutional 
code shown in Fig. 8-2-2. Figure 8-2-20 illustrates the tree diagram and the 
operation of the feedback decoder for m = 2. That is, in decoding the bit at 
branch j, the decoder considers the paths at branches / + 1 , and j+2. 
Beginning with the first branch, the decoder computes eight metrics 
(Hamming distances), and decides that the bit for the first branch is 0 if the 
minimum distance path is contained in the upper part of the tree, and 1 if 
the minimum distance path is contained in the lower part of the tree. In this 
example, the received sequence for the first three branches is assumed to be 
101111110, so that the minimum distance path is in the upper part of the 
tree. Hence, the first output bit is 0. 

The next step is to extend the upper part of the tree (the part of the tree 
that has survived) by one branch, and to compute the eight metrics for 
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FIGURE 8-2-20 An example of feedback decoding 
for a rale 1 /3 convolutional code. 
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branches 2, 3, hnd 4. For the assumed received sequence 111110011, the 
minimum-distance path is contained in the lower part of the section of the 
tree that survived from the first step. Hence, the second output bit is 1 . The 
third step is to extend this lower part of the tree and to repeat (he procedure 
described for the first two steps. 

Instead of computing metrics as described above, a feedback decoder for 
the BSC may be efficiently implemented by computing the syndrome from the 
received sequence and using a table lookup method for correcting errors. This 
method is similar to the one described for decoding block codes. For some 
convolutional codes, the feedback decoder simplifies to a form called a 
majority logic decoder or a threshold decoder (Massey, 1963; Heller, 1975). 


8-2-8 Practical Considerations in the Application of 
Convolutional Codes 

Convolutional codes are widely used in many practical applications of 
communications system design. Viterbi decoding is predominantly used for 
short constraint lengths ( K =£ 10), while sequential decoding is used for long 
constraint length codes, where the complexity of Viterbi decoding becomes 
prohibitive. The choice of constraint length is dictated by the desired coding 
gain. 

From the error probability results for soft-decision decoding given by 
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TABLE 8-M2 UPPER BOUNDS ON CODING GAIN FOR SOFT-DECISION DECODING OF SOME 
CONVOLUTION CODES 


Rate 1/2 codes 

Rate 1/3 codes 

Constraint 
length K 


Upper bound 

<db) 

Constraint 
length K 

dfrt* 

Upper bound 
(dB) 

3 

5 

3.98 

3 

8 

4.26 

4 

6 

4.77 

4 

10 

5.23 

5 

7 

5.44 

5 

12 

6.02 

6 

8 

6.02 

6 

13 

6.37 

7 

10 

6.99 

7 

15 

6.99 

8 

10 

6.99 

8 

16 

7.27 

9 

12 

7.78 

9 

18 

7.78 

10 

12 

7.78 

10 

20 

8.24 


(8-2-26) it is apparent that the coding gain achieved by a convolutional code 
over an uncoded binary PSK or QPSK system is 

coding gain «; 10 log, 0 (R c d, ree ) 

We also know that the minimum free distance d free can be increased either by 
decreasing the code rate or by increasing the constraint length, or both. Table 
8-2-12 provides a list of upper bounds on the coding gain for several 
convolutional codes. For purposes of comparison, Table 8-2-13 lists the actual 
coding gains and the upper bounds for several short constraint length 
convolutional codes with Viterbi decoding. It should be noted that the coding 
gain increases toward the asymptotic limit as the SNR per bit increases. 

These results are based on soft-decision Viterbi decoding. If hard-decision 
decoding is used, the coding gains are reduced by approximately 2 dB for the 
AWGN channel. 

Larger coding gains than those listed in the above tables are achieved by 


TABLE 8-2-13 CODING GAIN (dB) FOR SOFT-DECISION VITERBI DECODING 
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Source: Jacobs (1974); © IEEE. 
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FIGURE 8-2-21 


Performance of rate 1/2 and rate 1/3 Viterbi and 
sequential decoding, [ From Omura and Levitt 
( 1982 ) © 1982 IEEE.) 
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employing long constraint length convolutional codes, e.g., K = 50, and 
decoding such codes by sequential decoding. Invariably, sequential decoders 
are implemented for hard-decision decoding to reduce complexity. Figure 
8-2-21 illustrates the error rate performance of several constraint-length K — l 
convolutional codes for rates 1/2 and 1/3 and for sequential decoding (with 
hard decisions) of a rate 1/2 and a rate 1/3 constraint-length K~4\ 
convolutional codes. Note that the K = 41 codes achieve an error rate of 10 1 2 * * * 6 
at 2.5 and 3 dB, which are within 4-4.5 dB of the channel capacity limit, i.e., in 
vicinity of the cutoff rate limit. However, the rate 1/2 and rate 1/3, K-l codes 
with soft-decision Viterbi decoding operate at about 5 and 4.4 dB at 10 " 6 , 
respectively. These short-constraint-length codes achieve a coding gain of 
about 6 dB at 10 6 , while the long constraint codes gain about 7.5-8 dB. 

Two important issues in the implementation of Viterbi decoding are 

1 the effect of path memory truncation, which is a desirable feature that 
ensures a fixed decoding delay, and 

2 the degree of quantization of the input signal to the Viterbi decoder. 

As a rule of thumb, we stated that path memory truncation to about five 

constraint lengths has been found to result in negligible performance loss. 

Figure 8-2-22 illustrates the performance obtained by simulation for rate 1/2, 
constraint-lengths K — 3, 5, and 7 codes with memory path length of 32 bits. In 

addition to path memory truncation, the computations were performed with 
eight-level (three bits) quantized input signals from the demodulator. The 
broken curves are performance results obtained from the upper bound in the 
bit error rate given by (8-2-26). Note that the simulation results are close to the 
theoretical upper bounds, which indicate that the degradation due to path 
memory truncation and quantization of the input signal has a minor effect on 
performance (0.20-0.30 dB). 
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FIGURE 8-2-22 


FIGURE 8-2-23 


Bit error probability for rate 1/2 Viterbi decoding with 
eight-level quantized inputs to the decoder and 32-bit path 
memory. [From Heller and Jacobs (1971). © 1971 IEEE.\ 



Figure 8-2-23 illustrates the bit error rate performance obtained via 
simulation for hard-decision decoding of convolutional codes with K = 3-8. 
Note that with the K - 8 code, an error rate of 1(T 5 requires about 6 dB, which 
represents a coding gain of nearly 4 dB relative to uncoded QPSK. 

The effect of input signal quantization is further illustrated in Fig. 8-2-24 for 
a rate 1/2, K - 5 code. Note that three-bit quantization (eight levels) is about 
2 dB better than hard-decision decoding, which is the ultimate limit between 
soft-decision decoding and hard-decision decoding on the AWGN channel. 
The combined effect of signal quantization and path memory trunction for the 
rate 1/2, K- 5 code with 8-, 16-, and 32-bit path memories and either one- or 
three-bit quantization is shown in Fig. 8-2-25. It is apparent from these results 
that a path memory as short as three constraint lengths does not seriously 
degrade performance. 

When the signal from the demodulator is quantized to more than two levels, 
another problem that must be considered is the spacing between quantization 
levels. Figure 8-2-26 illustrates the simulation results for an eight-level uniform 
quantizer as a function of the quantizer threshold spacing. We observe that 


Performance of rate 1 / 2 codes with hard-decision Viterbi 
decoding and 32-bit path memory truncation. 

[From Heller and Jacobs (1971). © 1971 IEEE ] 




FIGURE 8-2-24 


FIGURE 8-2-25 


FIGURE 8-2-26 


510 DIGITAL COMMUNICATIONS 


Performance of rate 1 /2, K = 5 code with eight-, four-, and 
two-level quantization at the input to the Viterbi decoder. 
Path truncation length = 32 bits. [From Heller and Jacobs 
(1971). © 197! IEEE ] 
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Performance of rate 1/2, K = 5 code with 32-, 16-, and 8-bit 
path memory truncation and eight- and two-level 
quantization. [ From Heller and Jacobs (1971). © 1971 IEEE. 



Error rate performance of rate 1/2, K = 5 Viterbi decoder 
for £ fc /N u = 3.5 dB and eight-level quantization as a function 
of quantizer threshold level spacing for equally spaced 
thresholds [From Heller and Jacobs (1971). © 1971 IEEE.] 



Quantizer threshold spacing 


there is an optimum spacing between thresholds (approximately equal to 0.5). 
However, the optimum is sufficiently broad (0.4-0.7) so that, once it is set, 
there is little degradation resulting from variations in the AGC level of the 
order of ±20%. 
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FIGURE 8-2-27 Performance of a rate 1/2. K = 7 code with Viterbi 

decoding and eight-level quantization as a function of 
the carrier phase tracking loop SNR y, . [From Heller 
and Jacobs (1971). © 1971 IEEE.] 





Finally, we should point out some important results in the performance 
degradation due to carrier phase variations. Figure 8-2-27 illustrates the 
performance of a rate 1/2, K = 7 code with eight-level quantization and a 
carrier phase tracking loop SNR y , . Recall that in a PLL. the phase error has 
a variance that is inversely proportional to y L . The results in Fig. 8-2-27 
indicate that the degradation is large when the loop SNR is small (y L < 12 dB), 
and causes the error rate performance to bottom out at relatively high error 
rate. 

8-3 CODED MODULATION FOR BANDWIDTH- 
CONSTRAINED CHANNELS 

In the treatment of block and convolutional codes in Sections 8-1 and 8-2, 
respectively, performance improvement was achieved by expanding the band- 
width of the transmitted signal by an amount equal to the reciprocal of the 
code rate. Recall for example that the improvement in performance achieved 
by an (n. A: ) binary block code with soft-decision decoding is approximately 
101°gi<>(fl ( dmin - * In 2/y„) compared with uncoded binary or quaternary 
PSK. For example, when y h = 10 the (24, 12) extended Golay code gives a 
coding gain of 5 dB. This coding gain is achieved at a cost of doubling the 
bandwidth of the transmitted signal and, of course, at the additional cost in 
receiver implementation complexity. Thus, coding provides an effective 
method for trading bandwidth and implementation complexity against tram, 
mitter power. This situation applies to digital communications systems that are 
designed to operate in the power-limited region where R/W < 1. 

In this section, we consider the use of coded signals for bandwidth- 
constrained channels. For such channels, the digital communications system is 
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designed to use bandwidth-efficient multilevel/phase modulation, such as 
PAM, PSK, DPSK, or QAM, and operates in the region where /?/W>l. 
When coding is applied to the bandwidth-constrained channel, a performance 
gain is desired without expanding the signal bandwidth. This goal can be 
achieved by increasing the number of signals over the corresponding uncoded 
system to compensate for the redundancy introduced by the code. 

For example, suppose that a system employing uncoded four-phase PSK 
modulation achieves an R/W = 2 (bits/s)/Hz at an error probability of ltr* 
For this error rate the SNR per bit is y b = 10.5 dB. We may try to reduce the 
SNR per bit by use of coded signals, but this must be done without expanding 
the bandwidth. If we choose a rate R c =2/3 code, it must be accompanied by 
an increase in the number of signal points from four (two bits per symbol) to 
eight (three bits per symbol). Thus, the rate 2/3 code used in conjunction with 
eight-phase PSK, for example, yields the same data throughput as uncoded 
four-phase PSK. However, we recall that an increase in the number of signal 
phases from four to eight requires an additional 4 dB approximately in signal 
power to maintain the same error rate. Hence, if coding is to provide a benefit, 
the performance gain of the rate 2/3 code must overcome this 4 dB penalty. 

IF the modulation is treated as a separate operation independent of the 
encoding, the use of very powerful codes (large-constraint-length convolutional 
codes or large-block-length block codes) is required to offset the loss and 
provide some significant coding gain. On the other hand, if the modulation is 
an integral part of the encoding process and is designed in conjunction with the 
code to increase the minimum euclidean distance between pairs of coded 
signals, the loss from the expansion of the signal set is easily overcome and a 
significant coding gain is achieved with relatively simple codes. The key to this 
integrated modulation and coding approach is to devise an effective method for 
mapping the coded bits into signal points such that the minimum euclidean 
distance is maximized. Such a method was developed by Ungerboeck (1982), 
based on the principle of mapping by set partitioning. We describe this 
principle by means of two examples. 


Example 8*3*1: An 8- PSK Signal Constellation 

Let us partition the eight-phase signal constellation shown in Fig. 8-3-1 into 
subsets of increasing minimum euclidean distance. In the eight-phase signal 
set, the signal points are located on a circle of radius V# and have a 
minimum distance separation of 

d 0 = 2Vf sin U = V(2-V5)ST = 0.765V* 

In the first partitioning, the eight points are subdivided into two subsets of 
four points each, such that the minimum distance between points increases 
to = V2& In the second level of partitioning, each of the two subsets is 
subdivided into two subsets of two points, such that the minimum distance 
increases to d 2 = 2V& This results in four subsets of two points each. 
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FIGURE 8-3-1 



«o 





dy-lJF 



Finally, the last stage of partitioning leads to eight subsets, where each subset 
contains a single point. Note that each level of partitioning increases the 
minimum euclidean distance between signal points. The results of these three 
stages of partitioning are illustrated in Fig. 8-3-1. The way in which the coded 
bits are mapped into the partitioned signal points is described below. 


Example 8-3-2: A 16-QAM Signal Constellation 

The 16-point rectangular signal constellation shown in Fig. 8-3-2 is first 
divided into two subsets by assigning alternate points to each subset as 
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FIGURE 8-3-2 Set partitioning of 16-QAM signal. 


illustrated in the figure. Thus, the distance between points is increased from 
2V& to 2 V 2 W by the first partitioning. Further partitioning of the two 
subsets leads to greater separation in euclidean distance between signal 
points as illustrated in Fig. 8-3-2. It is interesting to note that for the 
rectangular signal constellations, each level of partitioning increases the 
minimum euclidean distance by V2, i.e., + = V2 for all i. 

In these two examples, the partitioning was carried out to the limit where 
each subset contains only a single point. In general, this may not be necessary. 
For example, the 16-point QAM signal constellation may be partitioned only 
twice, to yield four subsets of four points each. Similarly, the eight-phase PSK 
signal constellation can be- partitioned twice, to yield four subsets of two points 
each. 

The degree to which the signal is partitioned depends on the characteristics 
of the code. In general, the encoding process is performed as illustrated in Fig. 
8-3-3. A block of m information bits is separated into two groups of length A, 


1 

2 



Signal 

point 


FIGURE 8-3-3 General structure of combined encoder/modulator. 
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FIGURE S-3-4 


Four-state trellis-coded 8-PSK 
modulation. 
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and k 2 ■ The k, bits are encoded into n bits while the k 2 bits are left uncoded. 
Then, the n bits from the encoder are used to select one of the 2" possible 
subsets in the partitioned signal set while the k 2 bits are used to select one of 
the 2* 2 signal points in each subset. When k 2 = 0, all m information bits are 
encoded. 


Example 8-3-3 

Consider the use of the rate 1/2 convolutional code shown in Fig. 8-3-4 to 
encode one information bit while the second information bit is left uncoded. 
When used in conjunction with an eight-point signal constellation, e.g., 
eight-phase PSK or eight-point QAM, the two encoded bits are used to 
select one of the four subsets in the signal constellation, while the remaining 
information bit is used to select one of the two points within each subset. 
In this case, k x = 1 and k 2 = 1. The four-state trellis, which is shown in Fig. 
8-3-4(6), is basically the trellis for the rate 1/2 convolution encoder with the 
addition of parallel paths in each transition to accommodate the uncoded bit 
c 3 . Thus, the coded bits (c 1( c 2 ) are used to select one of the four subsets 
that contain two signal points each, while the uncoded bit is used to select 
one of the two signal points within each subset. Note that signal points 
within a subset are separated in distance by d 2 = 2 VI. Hence, the euclidean 
distance between parallel paths is d 2 . The mapping of coded bits (c,, c 2 , c 3 ) 
to signal points is illustrated in Fig. 8-3-4(c). As an alternative coding 
scheme, we may use a rate 2/3 convolutional encoder, and, thus, encode 
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FIGURE 8-3-5 


Rate 2/3 convolutional encoder for encoding both information bits. 



both information bits as shown in Fig. 8-3-5. This encoding leads to an 
eight-state trellis and results in better performance, but also requires a more 
complex implementation of the decoder as described below. 


Either block codes or convolutional codes may be used in conjunction with 
the partitioned signal constellation. In general, convolutional codes provide 
comparable coding gains to block codes and the availability of the Viterbi 
algorithm results in a simpler implementation for soft-decision decoding. For 
this reason, we limit our discussion to convolutional codes (linear trellis codes) 
and more generally to (nonlinear) trellis codes. 


Trellis-Coded Modulation Let us consider the use of the 8-PSK signal 
constellation in conjunction with trellis codes. Uncoded four-phase PSK 
(4-PSK) is used as a reference in measuring coding gain. Uncoded 4-PSK 
employs the signal points in either subset B 0 or B x of Fig. 8-3-1, for which the 
minimum distance of the signal points is \/2&. Note that this signal corres- 
ponds to a trivial one-state trellis with four parallel state transitions as shown 
in Fig. 8-3-6(a). The subsets D 0 , D 2 , D 4 , and £> 6 are used as the signal points 
for the purpose of illustration. 

For the coded 8-PSK modulation, we may use the four-state trellis shown in 
Fig. 8-3-6(h). Note that each branch in the trellis corresponds to one of the 
four subsets C 0) Cj, C 2 , or C 3 . For the eight-point constellation, each of the 
subsets C 0 , C u C 2 , and C 3 , contains two signal points. Hence, the state 
transition C 0 contains the two signal points corresponding to the bits (000, 100) 
or (0,4) in octal representation. Similarly, C 2 contains the two signal points 
corresponding to (010, 110), or to (2,6) in octal, C, contains the points corre- 
sponding to (001,101), or (1,5) in octal, arid C 3 contains the points 
corresponding to (011,111), or (3,7) in octal. Thus, each transition in the 
four-state trellis contains two parallel paths, as shown in more detail in Fig. 
8-3-6(c). Note that any two signal paths that diverge from one state and 
remerge at the same state after more than one transition have a squared 
euclidean distance of dl + 2d] = dl + d\ between them. For example, the 
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FIGURE 8-3-6 
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Uncoded 4-PSK and trellis-coded 8-PSK modulation. 


signal paths 0, 0, 0 and 2, 1, 2 are separated by dl + d\ - [(0.765) 2 + 4]# = 
4.585£ On the other hand, the squared euclidean distance between parallel 
transitions is d\-A^. Hence, the minimum euclidean distance separation 
between paths that diverge from any state and remerge at the same state in the 
four-state trellis is d 2 — 2V& This minimum distance in the trellis code is called 
the free euclidean distance and denoted by D fed , 

In the four-state trellis of Fig. 8-3-6(£>), D fed = 2Vg, When compared with 
the euclidean distance d 0 - V2% for the uncoded 4-PSK modulation, we 
observe that the four-state trellis code gives a coding gain of 3 dB. 

We should emphasize that the four-state trellis code illustrated in Fig. 
8-3-6 (6) is optimum in the sense that it provides the largest free euclidean 
distance. Clearly, many other four-state trellis codes can be constructed, 
including the one shown in Fig. 8-3-7, which consists of four distinct transitions 
from each state to all other states. However, neither this code nor any of the 
other possible four-state trellis codes gives a larger D, ci . 

The construction of the optimum four-state trellis code for the eight-point 
constellation was performed on the basis of the following heuristic rules: 

(a) Parallel transitions (when they occur) are assigned to signal points 
separated by the maximum euclidean distance, e.g., d 2 = 2V% for 8-PSK in the 
four subsets C 0 , C u C 2 , C v 
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FIGURE 8-3-7 


An alternative four-state trellis code. 
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(b) The transition originating from and merging into any state is assigned 
the subsets (C 0 , C 2 ) or (C,, Q), which have a maximum distance d\ = V2?'. 

(c) The signal points should occur with equal frequency. 

Note that rules (a) and (b) guarantee that the euclidean distance associated 
with single and multiple paths that diverge from any state and remerge in that 
state exceeds the euclidean distance of uncoded 4-PSK. Rule (c) guarantees 
that the trellis code will have a regular structure. 

We should indicate that the specific mapping of coded bits into signal points, 
as illustrated in Fig. 8-3-1, where the eight signal points are represented in an 
equivalent binary form, is not important. Other mappings can be devised by 
permuting subsets in a way that preserves the main property of increased 
minimum distance among the subsets. 

In the four-state trellis code, the parallel transitions were separated by the 
euclidean distance 2Vg, which is also Z) fcd . Hence, the coding gain of 3 dB is 
limited by the distance of the parallel transitions. Larger gains in performance 
relative to uncoded 4-PSK can be achieved by using trellis codes with more 
states, which allow for the elimination of the parallel transitions. Thus, trellis 
codes with eight or more states would use distinct transitions to obtain a larger 

j. 

For example, in Fig. 8-3-8, we illustrate an eight-state trellis code due to 
Ungerboeck (1982) for the 8-PSK signal constellation. The state transitions for 
maximizing the free euclidean distance were determined from application of 
the three basic rules given above. In this case, note that the minimum squared 
euclidean distance is 

D i d = dl + 2d] = 4.585 £ 

which, when compared with dl= 2% for uncoded 4-PSK, represents a gain of 
3.6 dB. Ungerboeck (1982, 1987) has also found rate 2/3 trellis codes with 16, 
32, 64, 128, and 256 states that achieve coding gains ranging from 4 to 5.75 dB 
for 8-PSK modulation. 
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FIG USE 8-3-8 


Eight-state trellis code for coded 
8-PSK modulation. 


o,,d 4 d : d. 


D| 0,DjD, 

»JW>? 


0 , 0 , 0 , 0 , 


DJ) ty D 0 D t 


D, 0,0,0, 


D ll D 2 D a D u 


o,d,d,d, 



The basic principle of set partitioning is easily extended to larger PSK. signal 
constellations that yield greater bandwidth efficiency. For example, 
3 (bits/s)/Hz can be achieved with either uncoded 8-PSK or with trellis-coded 
16-PSK modulation. Ungerboeck (1987) has devised trellis codes and has 
evaluated the coding gains achieved by simple rate 1/2 and rate 2/3 
convolutional codes for the 16-PSK signal constellations. The results are 
summarized below. 

Soft-decision Viterbi decoding for trellis-coded modulation is accomplished 
in two steps. Since each branch in the trellis corresponds to a signal subset, the 
first step in decoding is to determine the best signal point within each subset, 
i.e., the point in each subset that is closest in distance to the received point. We 
may call this subset decoding. In the second step, the signal point selected from 
each subset and its squared distance metric are used for the corresponding 
branch in the Viterbi algorithm to determine the signal path through the code 
trellis that has the minimum sum of squared distances from the sequence of 
received (noisy channel output) signals. 

The error rate performance of the trellis coded signals in the presence of 
additive gaussian noise can be evaluated by following the procedure described 
in Section 8-2 for convolutional codes. Recall that this procedure involves the 
computation of the probability of error for all different error events and 
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summing these error event probabilities to obtain a union bound on the 
first-event error probability. Note, however, that, at high SNR, the first-event 
error probability is dominated by the leading term, which has the minimum 
distance Z) fL .j. Consequently, at high SNR, the first-event error probability is 
well approximated as 

< w -» 

where N (aI denotes the number of signal sequences with distance D, ud that 
diverge at any state and remerge at that state after one or more transitions. 

In computing the coding gain achieved by trellis-coded modulation, we 
usually focus on the gain achieved by increasing D, cJ and neglect the effect of 
A/ lcJ . However; trellis codes with a large number of states may result in a large 
iV lcd that cannot be ignored in assessing the overall coding gain. 

In addition to the trellis-coded PSK modulations described above, powerful 
trellis codes have also been developed for PAM and QAM signal constella- 
tions. Of particular practical importance is the class of trellis-coded two- 
dimensional rectangular signal constellations. Figure 8-3-9 illustrates these 
signal constellations for Af-QAM where M = 16, 32. 64, and 128. The M ~ 32 
and 128 constellations have a cross pattern and are sometimes called 
cross -constellations. The underlying rectangular grid containing the signal 
points in /V/-QAM is called a lattice of type Z 2 (the subscript indicates the 
dimensionality of the space). When set partitioning is applied to this class of 
signal constellations, the minimum euclidean distance between successive 
partitions is d, il, = V2 for all i. as previously observed in Example 8-3-2. 


FIGURE 8-3-9 


Rectangular two-dimensional (QAM) signal constellations. 
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FIGURE 8-3-10 


Eight-state trellis for rectangular QAM signal 
constellations. 
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Figure 8-3-10 illustrates an eight-state trellis code that can be used with any 
of the M-QAM rectangular signal constellations for which M = 2*, where 
k = 4, 5, 6, , etc. With the eight-state trellis, we associate eight signal 

subsets, so that any of the A/-QAM signals sets for M 3 s 16 are suitable. For 
M = 2 m+ \ two input bits (A: , = 2) are encoded into n = 3(ri=Jt, + l) bits that 
are used to select one of the eight states corresponding to the eight subsets. 
The additional k z = m - input bits are used to select signal points within a 
subset, and result in parallel transitions in the eight-state trellis. Hence, 
16-QAM involves two parallel transitions in each branch of the trellis. More 
generally, the choice of an M - 2" 1+1 -point QAM signal constellation implies 
that the eight-state trellis contains 2 m “ 2 parallel transitions in each branch. 

The assignment of signal subsets to transitions is based on the same set of 
basic (heuristic) rules described above for the 8-PSK signal constellation. Thus, 
the four (branches) transitions originating from or leading to the same state are 
assigned either the subsets D 0 , D 2 , D 4 , D b or D x , D 3 , D 3 , D z . Parallel 
transitions are assigned signal points contained within the corresponding 
subsets. This eight-state trellis code provides a coding gain of 4dB. The 
euclidean distance of parallel transitions exceeds the free euclidean distance, 
and, hence, the code performance is not limited by parallel transitions. 

Larger size trellis codes for M-QAM provide even larger coding gains. For 
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TABLE 8-3-1 CODING GAINS FOR TRELLIS CODED PAM SIGNALS 


Number 

of 

states k , 

Code 

rate 

Jk, + 1 

m = 1 

coding gain (dB) 
of 4-PAM versus 
uncoded 2-PAM 

m = 2 

coding gain (dB) 
of 8-PAM versus 
uncoded 4-PAM 

m — > x 

asymptotic 
coding gain 
(dB) 

m -* x 

4 1 

1/2 

2.35 

3.31 

3.52 

4 

8 1 

1/2 

3.01 

3.77 

3.97 

4 

18 1 

1/2 

3.42 

4.18 

4.39 

8 

32 1 

1/2 

4.15 

4.91 

5.11 

12 

64 1 

1/2 

4.47 

5.23 

5.44 

.36 

128 1 

1/2 

5.05 

5.81 

6.02 

6 b 


Source: Ungerboeck {1987). 


example, trellis codes with 2' states for an M - 2" , + 1 QAM signal constellation 
can be constructed by convolutionally encoding fc, input bits into + 1 output 
bits. Thus, a rate R, . = kj{k x + 1) convolutional code is employed for this 
purpose. Usually, the choice of k t = 2 provides a significant fraction of the total 
coding gain that is achievable. The additional k 2 = m - k , input bits are 
uncoded, and are transmitted in each signal interval by selecting signal points 
within a subset. 

Tables 8-3-1 to 8-3r3, taken from the paper by Ungerboeck (1987), provide 
a summary of coding gains achievable with trellis coded modulation. Table 
8-3-1 summarizes the coding gains achieved for trellis-coded (one-dimensional) 
PAM modulation with rate 1/2 trellis codes. Note that the coding gain with a 
128-state trellis code is 5.8 dB for octal PAM, which is close to the channel 
cutoff rate R {) and less than 4 dB from the channel capacity limit for error rates 
in the range of 10 h -10 s , We should also observe that the number of paths 


TABLE 8-3-2 CODING GAINS FOR TRELLIS CODED 16-PSK 
MODULATION 


Number 

of 

states 


Code rate 
*. 

*, + l 

m = 3 

coding gain (dB) 
of 16-PSK versus 
uncoded 8-PSK 

m -* « 

Nu* 

4 

1 

1/2 

3.54 

4 

8 

1 

1/2 

4.01 

4 

16 

1 

1/2 

4.44 

8 


I 

1/2 

5.13 

8 

i <4 

1 

1/2 

5.33 

2 

; 2 « 

1 

1/2 

5.33 

2 

256 

2 

2/3 

5.51 

8 


h ource: Ungcboeck (1987). 
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TABLE 8-3-3 CODING GATNS FOR TRELLIS CODED QAM MODULATION 


Number 

of 

states 


Code 

rate 

fc, + 1 

m = 3 

gain (dB) of 
16-QAM versus 
uncoded 
8-QAM 

m — 4 

gain (dB) of 
32-QAM versus 
uncoded 
16-QAM 

m-S 

gain (dB) of 
64-QAM versus 
uncoded 
32-QAM 

m = oc 

asymptotic 
coding 
gain (dB) 

N( c <i 

4 

1 

1/2 

3.01 

3.01 

2.80 

3.01 

4 

8 

2 

2/3 

3.98 

3.98 

3.77 

3.98 

16 

16 

2 

2/3 

4.77 

4.77 

4.56 

4.77 

56 

32 

2 

2/3 

4.77 

4.77 

4.56 

4.77 

16 

64 

2 

2/3 

5.44 

5.44 

4.23 

5.44 

56 

128 

2 

2/3 

6.02 

6.02 

5.81 

6.02 

344 

256 

2 

2/3 

6.02 

6.02 

5.81 

6.02 

44 


Source: Ungerboeck (1987). 


/V fed with free euclidean distance D fed becomes large with an increase in the 
number of states. 

Table 8-3-2 lists the coding gain for trellis-coded 16-PSK. Again, we observe 
that the coding gain for eight or more trellis stages exceeds 4 dB, relative to 
uncoded 8-PSK. A simple rate 1/2 code yields 5.33 dB gain with a 128-stage 
trellis. 

Table 8-3-3 contains the coding gains obtained with trellis-coded QAM 
signals. Relatively simple rate 2/3 trellis codes yield a gain of 6dB with 128 
trellis stages for m = 3 and 4. 

The results in these tables clearly illustrate the significant coding gains that 
are achievable with relatively simple trellis codes. A 6 dB coding gain is close 
to the cutoff rate R 0 for the signal sets under consideration. Additional gains 
that would lead to transmission in the vicinity of the channel capacity bound 
are difficult to attain without a significant increase in coding/decoding 
complexity. 

Since the channel capacity provides the ultimate limit on code performance, 
we should emphasize that continued partitioning of large signal sets quickly 
leads to signal point separation within any subset that exceeds the free 
euclidean distance of the code. In such cases, parallel transitions are no longer 
the limiting factor on Z> fed . Usually, a partition to eight subsets is sufficient to 
obtain a coding gain of 5-6 dB with simple rate 1/2 or rate 2/3 trellis codes 
with either 64 or 128 trellis stages, as indicated in Tables 8-3-1 to 8-3-3. 

Convolutional encoders for the linear trellis codes listed in Tables 8-3-1 to 
8-3-3 for the M-PAM, Af-PSK, and A/-QAM signal constellations are given in 
the papers by Ungerboeck (1982, 1987). The encoders may be realized either 
with feedback or without feedback. For example Fig. 8-3-11 illustrates three 
feedback-free convolutional encoders corresponding to 4-, 8-, and 16-state 
trellis codes for 8-PSK and 16-QAM signal constellations. Equivalent realiza- 
tions of these trellis codes based on systematic convolutional encoders with 
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FIGURE 8-3-H Minimal feedback-free convolutional encoders for 8-PSK and 1 6-0 AM signals. \From Ungerhoeck 
(1982). © 1982 IEEE.) 


feedback are shown in Fig. 8-3-12. Usually, the systematic convolutional 
encoders are preferred in practical applications. 

A potential problem with linear trellis codes is that the modulated signal 
sets are not usually invariant to phase rotations. This poses a problem in 
practical applications where differential encoding is usually employed to avoid 
phase ambiguities when a receiver must recover the carrier phase after a 
temporary loss of signal. The problem of phase invariance and differential 
encoding/decoding was solved by Wei {1984a, b). who devised linear and 
nonlinear trellis codes that are rotationally invariant under either 180° or 90° 
phase rotations, respectively. For example, Fig. 8-3-13 illustrates a nonlinear 
eight-state convolutional encoder for a 32-QAM rectangular signal constella- 
tion that is invariant under 90° phase rotations. This trellis code has been 
adopted as an international standard for 9600 and 14,000 bits/s (high-speed) 
telephone line modems. 

Trellis-coded modulation schemes have also been developed for multi- 
dimensional signals. In practical systems, multidimensional signals are trans- 
mitted as a sequence of either one-dimensional (PAM) or two-dimensional 
(QAM) signals. Trellis codes based on 4-, 8-, and 16-dimensional signal 
constellations have been constructed, and some of these codes have been 
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FIGURE 8-3-12 
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Equivalent realizations of systematic convolutional encoders with feedback for 8-PSK and 
16-QAM. [ From Ungerboeck (1982) © 1982 IEEE.] 


implemented in commercially available modems. A potential advantage of 
trellis-coded multidimensional signals is that we can use smaller constituent 
two-dimensional signal constellations that allow for a trade-off between coding 
gain and implementation complexity. The papers by Wei (1987), Ungerboeck 
(1987), Gersho and Lawrence (1984), and Forney et al. (1984) treat 
multidimensional signal constellations for trellis-coded modulation. 

Finally, we should mention that a new design technique for trellis-coded 
modulation based on lattices and cosets of a sublattice has been described by 
Calderbank and Sloane (1987) and Forney (1988). This method for 
constructing trellis codes provides an alternative to the set partitioning method 
described above. However, the two methods are closely related. In this 
alternative method, a block of /c, bits is fed to a convolutional encoder. Each 
block of k ] input bits produces an output symbol that is a coset of the 
sublattice A', which is a subset of the chosen lattice. A second block of k 2 input 
bits is used to select one of the points in the coset at the output of the 
convolutional encoder. It is apparent that the cosets of the sublattice are akin 
to the subsets in set partitioning and the elements of the cosets are akin to the 
signal points within a subset. This new method has led to the discovery of new 
powerful trellis codes involving larger signal constellations, many of which are 
listed in the paper by Calderbank and Sloane (1987). 
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8-4 BIBLIOGRAPHICAL NOTES AND REFERENCES 

The pioneering work on coding and coded waveforms for digital communica- 
tions was done by Shannon (1948a, b), Hamming (1950), and Golay (1949). 
These works were rapidly followed with papers on code performance by 
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Gilbert (1952), new codes by Muller (1954) and Reed (1954), and coding 
techniques for noisy channels by Elias (1954. 1955) and Slepian (1956). 

During the period 1960-1970, there were a number of significant contribu- 
tions in the development of coding theory and decoding algorithms. In 
particular, we cite the papers by Reed and Solomon (1960) on Reed-Solomon 
codes, the papers by Hocquenghem (1959) and Bose and Ray-Chaudhuri 
(1960a. b) on BCH codes, and the Ph.D dissertation of Forney (1966a) on 
concatenated codes. These works were followed by the papers of Goppa (1970, 
1971) on the construction of a new class of linear cyclic codes, now called 
Goppa codes (see also Berlekamp, 1973), and the paper of Justesen (1972) on 
a constructive technique for asymptotically good codes. During this period, 
work on decoding algorithms was primarily focused on BCH codes. The first 
decoding algorithm for binary BCH codes was developed by Peterson (1960). 
A number of refinements and generalizations by Chien (1964), Forney (1965), 
Massey (1965), and Berlekamp (1968) led to the development of a computa- 
tionally efficient algorithm for BCH codes, which is described in detail by Lin 
and Costello (1983). 

In parallel with these developments on block codes are the developments in 
convolutional codes, which were invented by Elias (1955). The major problem 
in convolutional coding was decoding. Wozencraft and Reiffen (1961) de- 
scribed a sequential decoding algorithm for convolutional codes. This algo- 
rithm was later modified and refined by Fano (1963), and it is now called the 
Fano algorithm. Subsequently, the stack algorithm was devised by Ziganzirov 
(1966) and Jelinek (1969), and the Viterbi algorithm was devised by Viterbi 
(1967). The optimality and the relatively modest complexity for small 
constraint lengths have served to make the Viterbi algorithm the most popular 
in decoding of convolutional codes with K 10. 

One of the most important contributions in coding during the 1970s was the 
work of Ungerboeck and Csajka (1976) on coding for bandwidth-constrained 
channels. In this paper, it was demonstrated that a significant coding gain can 
be achieved through the introduction of redundancy in a bandwidth- 
constrained channel and trellis codes were described for achieving coding gains 
of 3-4 dB. This work has generated much interest among researchers and has 
led to a large number of publications over the past 10 years. A number of 
references can be found in the papers by Ungerboeck (1982, 1987) and Forney 
et al. (1984). Additional papers on coded modulation for bandwidth- 
constrained channels may also be found in the Special Issue on Voiceband 
Telephone Data Transmission, IEEE Journal on Selected Areas in Com- 
munication (September 1984). A comprehensive treatment of trellis-coded 
modulation is given in the book by Biglieri et al. (1991). 

In addition to the references given above on coding, decoding, and coded 
signal design, we should mention the collection of papers published by the 
IEEE Press entitled Key Papers in the Development of Coding Theory, edited 
by Berlekamp (1974). This book contains important papers that were 
published in the first 25 years of coding theory. We should also cite the Special 
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Issue on Error-Correcting Codes, IEEE Transactions on Communications 
(October 1971). 


PROBLEMS 


8-1 The generator matrix for a linear binary code is 


G = 


0 

0 

1 


0 1 
1 0 
0 0 


110 1 
0 111 
1110 


a Express G in systematic (I j P] form, 
b Determine the parity check matrix H for the code, 
c Construct the table of syndromes for the code, 
d Determine the minimum distance of the code. 

e Demonstrate that the code word corresponding to the informatin sequence 101 
is orthogonal to H. 

8-2 List the code words generated by the matrices given in (8-1-35) and (8-1-37), and. 

thus, demonstrate that these matrices generate the same set of code words. 

8-3 The weight distribution of Hamming codes is known. Expressed as a polynomial in 
powers of x, the weight distribution for the binary Hamming codes of block length 
n is 


A(x) = S' 4 -*' 

i O 


n + 


[{1 +x)" +n(l +.r) ( ” ,),2 (1 ~x) ( ' 


where A, is the number of code words of weight i. Use this formula to determine 
the weight distribution of the (7.4) Hamming code and check your result with the 
list of code words given in Table 8-1-2. 

8-4 The polynomial 

g(p)^P 4 + P + 1 

is the generator for the (15, 11) Hamming binary code, 
a Determine a generator matrix G for this code in systematic form, 
b Determine the generator polynomial for the dual code. 

8*5 For the (7,4) cyclic Hamming code with generator polynomial g(p) = p' + p' + 1, 
construct an (8,4) extended Hamming code and list all the code words. What is 
f/ n „„ for the extended code? 

8-6 An (8,4) linear block code is constructed hy shortening a ( 15. 1 1) Hamming code 
generated hy the generator polynomial g(p) = p* + p + 1. 
a Construct the code words of the (8. 4) code and list them, 
b What is the minimum distance of the (8,4) code? 

8-7 The polynomial // , + 1 when factored yields 

p' ' + 1 = (p J + p' +■ I )(p J + p' + P + p + I) 
x U >4 + P + I )ip ' + P + I )(p + I ) 

a Construct a systematic (15,5) code using the generator polynomial 
8(p) = (p 4 + p' + p' +P + 1)(/> J +-/> + !)(/)• 4 /> 4 1) 
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b What is the minimum distance of the code? 
c How many random errors per code word can be corrected? 
d How many errors can be detected by this code? 

e List the code words of a (15.2) code constructed from the generator polynomial 

g(.P) =z (p ] ' + l)/(p 2 + p + \) 


and determine the minimum distance. 

84) Construct the parity check matrices H, and H 2 corresponding to the generator 
matrices G, and G 2 given by (8-1-34) and (8-1-35), respectively. 

8-9 Construct an extended (8,4) code from the (7,4) Hamming code by specifying the 
generator matrix and the parity check matrix. 

8-10 A systematic (6. 3) code has the generator matrix 


G = 


1 

0 

0 


0 0 
1 0 
0 1 


1 1 
0 1 

1 0 


0 

1 

1 


Construct the standard array and determine the correctable error patterns and 
their corresponding syndromes. 

8-11 Construct the standard array for the (7, 3) code with generator matrix 


G = 


1 

0 

0 


0 0 
1 0 
0 1 


1 0 
1 1 
0 1 


1 1 
1 0 
1 1 


and determine the correctable patterns and their corresponding syndromes. 

8-12 Determine the correctable error patterns (of least weight) and their syndromes for 
the systematic (7, 4) cyclic Hamming code. 

8-13 Prove that if the sum of two error patterns e, and e 2 is a valid code word C, then 
each pattern has the same syndrome. 

8-14 Let g(p) = p* +/C + p* + p 2 + 1 be a polynomial over the binary field. 

a Find the lowest-rate cyclic code whose generator polynomial is g(p). What is 
the rate of this code? 

b Find the minimum distance of the code found in (a), 
c What is the coding gain for the code found in (a). 

8-15 The polynomial g{p)~ p + 1 over the binary field is considered. 

a Show that this polynomial can generate a cyclic code for any choice of n. Find 
the corresponding k. 

b Find the systematic form of G and H for the code generated by g(p). 
c Can you say what type of code this generator polynomial generates? 

8-16 Design a (6, 2) cyclic code by choosing the shortest possible generator polynomial, 
a Determine the generator matrix G (in the systematic form) for this code and 
find all possible code words, 
b How many errors can be corrected by this code? 

8-17 Prove that any two n -tuples in the same row of a standard array add to produce a 
valid code word. 

8-18 Beginning with a (15.7) BCH code, construct a shortened (12,4) code. Give the 
generator matrix for the shortened code. 

8-19 In Section 8-1-2, it was indicated that when an ( n,k ) Hadamard code is mapped 
into waveforms by means of binary PSK, the corresponding M = 2* waveforms 
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are orthogonal. Determine the bandwidth expansion factor for the M orthogonal 
waveforms and compare this with the bandwidth requirements of orthogonal FSK 
detected coherently. 

8-20 Show that the signaling waveforms generated from a maximum-length shift - 
register code by mapping each bit in a code word into a binary PSK signal are 
equicorrelated with correlation coefficient p, = -\/(M - 1), i.e., the M waveforms 
form a simplex set. 

8-21 Compute the error probability obtained with a (7,4) Hamming code on an 
AWGN channel, both for hard-decision and soft-decision decoding. Use (8-1-50), 
(8-1-52), (8-1-82), (8-1-90), and (8-1-91). 

8-22 Use the results in Section 2-1-6 to obtain the Chernoff bound for hard-decision 
decoding given by (8-1-89) and (8-1-90). Assume that the all-zero code word is 
transmitted and determine an upper bound on the probability that code word C,„, 
having weight w,„, is selected. This occurs if V„, or more bits are in error. To 
apply the Chernoff bound, define a sequence of w,„ random variables as 

_ | 1 with probability p 

1 - 1 with probability 1 - p 

where ( = 1,2 w,„, and p is the probability of error. For the BSC. the {A,) are 

statistically independent. 

8-23 A convolutional code is described by 

8. = [1 0 0], g 2 = [l 0 1], g, = [l 1 1] 

a Draw the encoder corresponding to this code, 
b Draw the state -transition diagram for this code, 
c Draw the trellis diagram for this code, 
d Find the transfer function and the free distance of (his code, 
e Verify whether or not this code is catastrophic. 

8-24 The convolutional code of Problem 8-23 is used for transmission over a AWGN 
channel with hard-decision decoding. The output of the demodulator detector is 
(10100101 1110111...). Using the Viterbi algorithm, find the transmitted sequence. 

8-25 Repeat Problem 8-23 for a code with 

g! = (l 1 0], g ; = [l 0 1], g, = [l 1 1] 

8-26 The block diagram of a binary convolutional code is shown in Fig. P8-26. 
a Draw the state diagram for the code, 
b Find the transfer function of the code, T(D). 
c What is i / fr „ , the minimum free distance of the code? 



FIGURE PS-26 
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FIGURE P8-27 



d Assume that a message has been encoded by this code and transmitted over a 
binary-symmetric channel with an error probability of p = 10 \ If the received 
sequence is r = (110, 110, 110, 111,010, 101, 101), using the Viterbi algorithm, 
find the transmitted bit sequence. 

e Find an upper bound to the bit error probability of the code when the above 
binary-symmetric channel is employed. Make any reasonable approximation. 

8-27 The block diagram of a (3, 1) convolutional code is shown in Fig. P8-27. 
a Draw the state diagram of the code, 
b Find the transfer function T(D) of the code. 

c Find the minimum free distance (d Ucc ) of the code and show the corresponding 
path (at distance d Ucc from the all-zero code word) on the trellis, 
d Assume that four information bits (x lt x 2 , x*, x 4 ), followed by two zero bits, 
have been encoded and sent via a binary-symmetric channel with crossover 
probability equal to 0.1. The received sequence is (11 1, 111, 111, 11 1, 111, 1 11). 
Use the Viterbi decoding algorithm to find the most likely data sequence. 

8-28 In the convolutional code generated by the encoder shown in Fig. P8-28. 
a Find the transfer function of the code in the form T(N, D). 
b Find d frec of the code. 

c If the code is used on a channel using hard-decision Viterbi decoding, assuming 
the crossover probability of the channel is p = 10 6 , use the hard-decision bound 
to find an upper bound on the average bit error probability of the code. 


FIGURE P8-28 



8-29 Figure P8-29 depicts a rate 1/2, constraint length K = 2, convolutional code, 
a Sketch the tree diagram, the trellis diagram, and the state diagram, 
b Solve for the transfer function T(D,N, J) and, from this, specify the minimum 
free distance. 


JJ 
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FIGURE P8-30 



8-30 A rate 1/2, K — 3, binary convolutional encoder is shown in Fig. P8-3Q. 
a Draw the tree diagram, the trellis diagram, and the state diagram, 
b Determine the transfer function T(D, N, J) and, from this, specify the minimum 
free distance. 

8-31 Sketch the convolutional encoders for the following codes: 
a rate 1/2, K = 5, maximum free distance code (Table 8-2-1); 

b rate 1/3, K = 5, maximum free distance code (Table 8-2-2); 

c rate 2/3, K = 2, maximum free distance code (Table 8-2-8). 

8-32 Draw the state diagram for the rate 2/3, K =2, convolutional code indicated in 

Problem 8-31(c) and, for each transition, show the output sequence and the 
distance of the output sequence from the all-zero sequence. 

8-33 Consider the K =3, rate 1/2, convolutional code shown in Fig. P8-30. Suppose 
that the code is used on a binary symmetric channel and the received sequence for 
the first eight branches is 00011000000.01001. Trace the decisions on a 
trellis diagram and label the survivors’ Hamming distance metric at each node 
level. If a tie occurs in the metrics required for a decision, always choose the upper 
path (arbitrary choice). 

8-34 Use the transfer function derived in Problem 8-30 for the R c - 1/2, K = 3, 
convolutional code to compute the probability of a bit error for an AWGN 
channel with (a) hard-decision and (b) soft-decision decoding. Compare the 
performance by plotting the results of the computation on the same graph. 

8-35 Use the generators given by (8-2-36) to obtain the encoder for a dual-3, rate 1/2 
convolutional code. Determine the state diagram and derive the transfer function 
T(D, N, J). 

8-36 Draw the state diagram for the convolutional code generated by the encoder 
shown in Fig. P8-36 and, thus, determine if the code is catastrophic or 
noncatastrophic. Also, give an example of a rate 1 /2, K = 4, convolutional encoder 
that exhibits catastrophic error propagation. 

8-37 A trellis coded signal is formed as shown in Fig. P8-37 by encoding one bit by use 



FIGURE PS-36 
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FIGURE P8-37 
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bits 


of a rate 1/2 convolutional code, while three additional information bits are left 
uncoded. Perform the set partitioning of a 32-QAM (cross) constellation and 
indicate the subsets in the partition. By how much is the distance between adjacent 
signal points increased as a result of partitioning? 

8-38 Let x, and x 2 be two code words of length n with distance d and assume that these 
two code words are transmitted via a binary-symmetric channel with crossover 
probability p. Let P(d) denote the error probability in transmission of these two 
code words, 
a Show that 

p ( d ) « 2 v /»(yi |*i)p(y; | * 2 ) 

1 

where the summation is over all binary sequences y ( . 
b From the above, conclude that 


P(<i)*l4p(i-P)Y a 




SIGNAL DESIGN FOR 
BAND-LIMITED CHANNELS 


In previous chapters, we considered the transmission of digital information 
through an additive gaussian noise channel. In effect, no bandwidth constraint 
was imposed on the signal design and the communication system design. 

In this chapter, we consider the problem of signal design when the channel 
is band-limited to some specified bandwidth W Hz. Under this condition, the 
channel may be modeled as a linear filter having an equivalent lowpass 
frequency response C(f ) that is zero for |/| > W. 

The first topic that is treated is the design of the signal pulse g(t) in a 
linearly modulated signal, represented as 

= ~nT) 

n 

that efficiently utilizes the total available channel bandwidth W. We shall see 
that when the channel is ideal for |/| s W, a signal pulse can be designed that 
allows us to transmit at symbol rates comparable to or exceeding the channel 
bandwidth IT. On the other hand, when the channel is not ideal, signal 
transmission at a symbol rate equal to or exceeding W results in intersvmbol 
interference (ISI) among a number of adjacent symbols. 

The second topic that is treated in this chapter is the use of coding to shape 
the spectrum of the transmitted signal and, thus, to avoid the problem of ISI. 

We begin our discussion with a general characterization of band-limited, 
linear filter channels. 


9-1 CHARACTERIZATION OF BAND-LIMITED 
CHANNELS 

Of the various channels available for digital communications, telephone 
channels are by far the most widely used. Such channels are characterized as 
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band -limited linear filters. This is certainly the proper characterization when 
frequency-division multiplexing (FDM) is used as a means for establishing 
channels in the telephone network. Recent additions to the telephone network 
employ pulse-code modulation (PCM) for digitizing and encoding the analog 
signal and time-division multiplexing (TDM) for establishing multiple chan- 
nels. Nevertheless, filtering is still used on the analog signal prior to sampling 
and encoding. Consequently, even though the present telephone network 
employs a mixture of FDM and TDM for transmission, the linear filter model 
for telephone channels is still appropriate. 

For our purposes, a band-limited channel such as a telephone channel will 
be characterized as a linear filter having an equivalent lowpass frequency 
response characteristic C(/). Its equivalent lowpass impulse response is 
denoted by c(t). Then, if a signal of the form 

j(r) = Re [vity 2 *^ (9-1-1 ) 

is transmitted over a bandpass telephone channel, the equivalent lowpass 
received signal is 

r,(0=f v(r)c{t ~ z)dx + z{t) (9-1-2) 

— 3C 


where the integral represents the convolution of c(r) with u(r), and z{t) 
denotes the additive noise. Alternatively, the signal term can be represented in 
the frequency domain as V{f)C(f), where V{f) is the Fourier transform of 
u(z). 

If the channel is band-limited to W Hz then C(/) = 0 for |/| > W. As a 
consequence, any frequency components in V{f) above |/| = W will not be 
passed by the channel. For this reason, we limit the bandwidth of the 
transmitted signal to W Hz also. 

Within the bandwidth of the channel, we may express the frequency 
response C(/) as 

C(f) = fC(/)| (9-1-3) 

where |C(/)| is the amplitude response characteristic and d(f) is the phase 
response characteristic. Furthermore, the envelope delay characteristic is 
defined as 


r(/) = 


1 dm 

2 n df 


(9-1-4) 


A channel is said to be nondistorting or ideal if the amplitude response \C(f)\ is 
constant for all 1/1 W and 0(/) is a linear function of frequency, i.e„ r (/) is a 
constant for all |/| « W. On the other hand, if |C(/)| is not constant for all 
|/|=sW, we say that the channel distorts the transmitted signal V(f) in 
amplitude , and, if r (/) is not constant for all |/| « W, we say that the channel 
distorts the signal V\f) in delay. 
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FIGURE 9-1-1 Effect of charnel distortion: (a) channel input: (/?) channel output: (c) equalizer output. 


As a result of the amplitude and delay distortion caused by the nonideal 
channel frequency response characteristic C{f), a succession of pulses trans- 
mitted through the channel at rates comparable to the bandwidth W are 
smeared to the point that they are no longer distinguishable as well-defined 
pulses at !he receiving terminal. Instead, they overlap and, thus, we have 
intersymbol interference. As an example of the effect of delay distortion on a 
transmitted pulse. Fig. 9-1-1 (a) illustrates a band-limited pulse having zeros 
periodically spaced in time at points labeled ±7, ±27, etc. If information is 
conveyed by the pulse amplitude, as in PAM. for example, then one can 
transmit a sequence of pulses, each of which has a peak at the periodic zeros of 
the other pulses. However, transmission of the pulse through a channel 
modeled as having a linear envelope delay characteristic rf /) (quadratic phase 
$(/)] results in the received pulse shown in Fig. 9-1 -1(h) having zero-crossings 
that are no longer periodically spaced. C onsequently, a sequence of successive 
pulses would be smeared into one another and the peaks of the pulses would 
no longer he distinguishable. Thus, the channel delay distortion results in 
intersymboi interference. As will be discussed in Chapter 10, il is possible to 
compensate for the nonideal frequency response characteristic of the channel 
bv use of a filter or equalizer at the demodulator. Figure 9-l-!(e) illustrates the 
output ol a linear equalizer that compensates lor the linear distortion in the 
channel. 

The extent of the intersymboi interference on a telephone channel can be 
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Frequency (Hz) Frequency (Hz) 

FIGURE 9-1-2 Average amplitude and delay characteristics of medium-range telephone channel. 


appreciated by observing a frequency response characteristic of the channel. 
Figure 9-1-2 illustrates the measured average amplitude and delay as functions 
of frequency for a medium-range (180-725 mi) telephone channel of the 
switched telecommunications network as given by Duffy and Tratcher (1971). 
We observe that the usable band of the channel extends from about 300 Hz to 
about 3000 Hz. The corresponding impulse response of this average channel is 
shown in Fig. 9-1-3. Its duration is about 10 ms. In comparison, the transmitted 
symbol rates on such a channel may be of the order of 2500 pulses or symbols 
per second. Hence, intersymbol interference might extend over 20-30 symbols. 

In addition to linear distortion, signals transmitted through telephone 
channels are subject to other impairments, specifically nonlinear distortion, 
frequency offset, phase jitter, impulse noise and thermal noise. 

Nonlinear distortion in telephone channels arises from nonlinearities in 


FIGURE 9-1-3 Impulse response of average channel with amplitude and delay shown in Fig. 9-1-2. 
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amplifiers and compandors used in the telephone system. This type of 
distortion is usually small and it is very difficult to correct. 

A small frequency offset , usually Jess than 5 Hz, results from the use of 
carrier equipment in the telephone channel. Such an offset cannot be tolerated 
in high-speed digital transmission systems that use synchronous phase-coherent 
demodulation. The offset is usually compensated for by the carrier recovery 
loop in the demodulator. 

Phase jitter is basically a low-index frequency modulation of the transmitted 
signal with the low frequency harmonics of the power line frequency 
(50-60 Hz). Phase jitter poses a serious problem in digital transmission of high 
rates. However, it can be tracked and compensated for, to some extent, at the 
demodulator. 

Impulse noise is an additive disturbance. It arises primarily from the 
switching equipment in the telephone system. Thermal (gaussian) noise is also 
present at levels of 20-30 dB below the signal. 

The degree to which one must be concerned with these channel impairments 
depends on the transmission rate over the channel and the modulation 
technique. For rates below 1800 bits/s (R/W < 1), one can choose a modula- 
tion technique, e.g., FSK, that is relatively insensitive to the amount of 
distortion encountered on typical telephone channels from all the sources listed 
above. For rates between 1800 and 2400 bits/s (R/W « 1), a more bandwidth- 
efficient modulation technique such as four-phase PSK is usually employed. At 
these rates, some form of compromise equalization is often employed to 
compensate for the average amplitude and delay distortion in the channel. In 
addition, the carrier recovery method is designed to compensate for the 
frequency offset. The other channel impairments are not that serious m their 
effects on the error rate performance at these rates. At transmission rates 
above 2400 bits/s (R/W >1), bandwidth-efficient coded modulation techniques 
such as trellis-coded QAM, PAM, and PSK are employed. For such rates, 
special attention must be paid to linear distortion, frequency offset, and phase 
jitter. Linear distortion is usually compensated for by means of an adaptive 
equalizer. Phase jitter is handled by a combination of signal design and some 
type of phase compensation at the demodulator. At rates above 9600 bits/s, 
special attention must be paid not only to linear distortion, phase jitter, and 
frequency offset, but also to the other channel impairments mentioned above. 

Unfortunately, a channel model that encompasses all the impairments listed 
above becomes difficult to analyze. For mathematical tractability the channel 
model that is adopted in this and the next two chapters is a linear filter that 
introduces amplitude and delay distortion and adds gaussian noise. 

Besides the telephone channels, there are other physical channels that 
exhibit some form of time dispersion, and thus, introduce intersymbol 
interference. Radio channels such as shortwave ionospheric propagation (HF) 
and tropospheric scatter are two examples of time-dispersive channels. In these 
channels, time dispersion and, hence, intersymbol interference is the result of 
multiple propagation paths with different path delays. The number of paths 
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FIGURE 9-1-4 



Frequency |H?) 

Scattering function of a medium-range tropospheric scatter channel. 


and the relative time delays among the paths vary with time, and, for this 
reason, these radio channels are usually called time-variant multipath channels 
The time-variant multipath conditions give rise to a wide variety of frequency 
response characteristics. Consequently the frequency response characterization 
that is used for telephone channels is inappropriate for time-variant multipath 
channels. Instead, these radio channels are characterized statistically, as 
explained in more detail in Chapter 14, in terms of the scattering function, 
which, in brief, is a two-dimensional representation of the average received 
signal power as a function of relative time delay and Doppler frequency. 

For illustrative purposes, a scattering function measured on a medium-range 
(150 mi) trophospheric scatter channel is shown in Fig. 9-1-4. The total time 
duration (multipath spread) of the channel response is approximately 0.7 p.s on 
the average, and the spread between “half-power points” in Doppler fre- 
quency is a little less than 1 Hz on the strongest path and somewhat larger on 
the other paths. Typically, if one is transmitting at a rate of 10 7 symbols/s over 
such a channel, the multipath spread of 0.7 /ns will result in intersymbol 
interference that spans about seven symbols. 

In this chapter, we deal exclusively with the linear time-invariant filter 
model for a band-limited channel. The adaptive equalization techniques 
presented in Chapters 10 and 11 for combating intersymbol interference are 
also applicable to time-invariant multipath channels, under the condition that 



540 DIGITAL COMMUNICATIONS 


the time variations in the channel are relatively slow in comparison to the total 
channel bandwidth or, equivalently, to the symbol transmission rate over the 
channel. 


9-2 SIGNAL DESIGN FOR BAND-LIMITED 
CHANNELS 

It was shown in Chapter 4 that the equivalent lowpass transmitted signal for 
several different types of digital modulation techniques has the common form 


■>(')= 2 U(t-nT) (9-2-1) 

/i --= 0 

where {/„} represents the discrete information-bearing sequence of symbols and 
g(0 is a pulse that, for the purposes of this discussion, is assumed to have a 
band-limited frequency response characteristic G(/), i.e., G{f) - 0 for |/| > W. 
This signal is transmitted over a channel having a frequency response C(f), 
also limited to \f \ s W. Consequently, the received signal can be represented as 


where 


oo 

' 7(0 = Z l,Mt - nT) + Z{t) 


n --<) 


(9-2-2) 



g(r)c(r - x)dx 


(9-2-3) 


and z(n represents the additive white Gaussian noise. 

Let us suppose that the received signal is passed first through a filter and 
then sampled at a rate 1/T samples/s. We shall show in a subsequent section 
that the optimum filter from the point of view of signal detection is one 
matched to the received pulse. That is, the frequency response of the receiving 
filter is We denote the output of the receiving filter as 


y ( 0 = Z !n*{t ~nT) + v(t) (9-2-4) 

where .x(r) is the pulse representing the response of the receiving filter to the 
input pulse h{t) and v(/) is the response of the receiving filter to the noise z(t). 
Now, if y(t) is sampled at times / = kT + r,„ k = 0, 1, . . . , we have 


y(kT + r (l ) = v* = Z I,MkT - nT + r 0 ) + v{kT + r„) 

n =0 

or, equivalently, 

CO 

ft = Zu„ + n, 


(9-2-5) 


k = 0, 1 ... . 


(9-2-6) 
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FIGURE 9-2-1 


where r ( , is the transmission delay through the channel. The sample values can 
be expressed as 

= x 0 ( k +-- S - J + v, , /c — 0, 1 (9-2-7 ) 

n ^ k 

We regard x {) as an arbitrary scale factor, which we arbitrarily set equal to 
unity for convenience. Then 

y*=A + 2 + v* (9-2-8) 

n “0 

The term 4 represents the desired information symbol at the k th sampling 
instant, the term 

X f »*k „ 

n 0 
n * k 

represents the intersymbol interference (ISI), and v k is the additive gaussian 
noise variable at the kth sampling instant. 

The amount of intersymbol interference and noise in a digital communica- 
tions system can be viewed on an oscilloscope. For PAM signals, we can 
display the received signal y(r) on the vertical input with the horizontal sweep 
rate set at 1/72 The resulting oscilloscope display is called an eye pattern 
because of its resemblance to the human eye. For example. Fig. 9-2-1 
illustrates the eye patterns for binary and four-level PAM modulation. The 
effect of ISI is to cause the eye to close, thereby reducing the margin tor 
additive noise to cause errors. Figure 9-2-2 graphically illustrates the effect of 
intersymbol interference in reducing the opening of a binary eve. Note that 
intersymbol interference distorts the position of the zero-crossings and causes 


Examples of eye patterns for binary and quaternary amplitude shift keying tor PAM). 



BINARY 


QUATERNARY 
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FIGURE 9-2-2 Effect of intersymbol interference on eye opening. 


Optimum 


sampling 



a reduction in the eye opening. Thus, it causes the system to be more sensitive 
to a synchronization error. 

For PSK and QAM it is customary to display the “eye pattern” as a 
two-dimensional scatter diagram illustrating the sampled values {y*} that 
represent the decision variables at the sampling instants. Figure 9-2-3 illustrates 
such an eye pattern for an 8-PSK signal. In the absence of intersymbol 
interference and noise, the superimposed signals at the sampling instants would 
result in eight distinct points corresponding to the eight transmitted signal 
phases. Intersymbol interference and noise result in a deviation of the received 
samples {y*} from the desired 8-PSK signal. The larger the intersymbol 
interference and noise, the larger the scattering of the received signal samples 
relative to the transmitted signal points. 

Below, we consider the problem of signal design under the condition that 
there is no intersymbol interference at the sampling instants. 


9-2-1 DESIGN OF BAND-LIMITED SIGNALS FOR NO 
INTERSYMBOL INTERFERENCE— THE NYQUIST 
CRITERION 


For the discussion in this section and in Section 9-2-2, we assume that the 
band-limited channel has ideal frequency response characteristics, i.e„ C{f) = 1 


Transmitted 
eight-phase signal 
ial 


Received signal samples 
at the output of demodulator 
( b ) 


FIGURE 9-2-3 Two-dimensional digital ‘eye patterns.'' 
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for |/| W. Then the pulse x{t) has a spectral characteristic X(f) -\G(f)\ 2 , 
where 

rW 

x(t) = I X(f)e j2 ” r! df (9-2-9) 

' LV 


We are interested in determining the spectral properties of the pulse x(t) and, 
hence, the transmitted pulse g(t), that results in no intersymbol interference. 
Since 

co 

y* = l a + 2 Lx k + V k (9-2-10) 

n -0 
n * k 

the condition for no intersymbol interference is 


jt(/ = k.T) — x k 


1 (k = 0) 

0 (**0) 


(9-2-11) 


Below, we derive the necessary and sufficient condition on X(f) in order for 
x(r) to satisfy the above relation. This condition is known as the Nyquist 
pulse-shaping criterion or Nyquist condition for zero IS 1 and is stated in the 
following theorem. 


Theorem (Nyquist) 

The necessary and sufficient condition for x(f) to satisfy 

xcn-l* ( " =0) 

lO (n*0) 

is that its Fourier transform X(f) satisfy 


2 X(f + mlT) — T 


(9-2-12) 


(9-2-13) 


Proof 

In general, x(t) is the inverse Fourier transform of X(f). Hence, 

*(0=f X(f)e> 2 ^df (9-2-14) 

“ — co 

At the sampling instants t-nT , this relation becomes 

x(nT)=f X(f)e' 2 * fnT df 


(9-2-15) 
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Let us break up the integral in (9-2-15) into integrals covering the finite 
range of 1 IT. Thus, we obtain 

■x r{2m + \)!2T 

x{nT) - X X{f)e i2 * fnT df 

•'(2m ~])/2T 


* r\/2T 

* X X{f + m/T)e' 2 * fnT dt 

m= -* J-V2 T 


r\ar r ^ -i 

X X(f + m/T) \e’ 2 *f nT df 

J ' 1/27" L m4 = -OD J 


rU2T 

= B(f)e jlKfnr df 

J-\!2T 

(9-2-16) 

where we have defined B(f) as 


X 

S(/)= X X(f + m/T) 

m= 

(9-2-17) 

Obviously £(/) is a periodic function with period 1 /T, and, therefore, it can be 
expanded in terms of its Fourier series coefficients {h„} as 

S(/)= X Ke> 2mfl 

(9-2-18) 

where 


b n = T \ B(f)e~i lnnfT df 

J— 1/2 T 

(9-2-19) 

Comparing (9-2-19) and (9-2-16), we obtain 


b n = Tx(-nT) 

(9-2-20) 

Therefore, the necessary and sufficient condition for (9-2-10) to be satisfied is 
that 

b _\T (n - 0) 

" lo (n* 0 ) 

(9-2-21) 

which, when substituted into (9-2-18), yields 


B(f)=T 

(9-2-22) 

or, equivalently, 


X X(f + m/T)= T 

m — — sc 

(9-2-23) 
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FIGURE 9-2-4 


FIGURE 9-2-5 


I X<f+n/T) 



Plot of B(f) for the case T < 1/2W. 


This concludes the proof of the theorem. 

Now suppose that the channel has a bandwidth of W. Then C(f) ® 0 for 
|/|> W and, consequently, X(f) =0 for |/| > W. We distinguish three cases. 

1 When T < 1/2 W, or, equivalently, 1/T>2W, since B(f) = + 

n/T) consists of nonoverlapping replicas of X{f), separated by l/T as shown 
in Fig. 9-2-4, there is no choice for X(f) to ensure B(f)^T in this case and 
there is no way that we can design a system with no ISI. 

2 When T = 1/2VV, or, equivalently, l/T = 2W (the Nyquist rate), the 
replications of X(f), separated by l/T, are as shown in Fig. 9-2-5. It is clear 
that in this case there exists only one X{f) that results in B(f) = T, namely, 



(l/l<W) 

(otherwise) 


which corresponds to the pulse 


x(0 = 


sin ( Jtt/T ) 
n t/T 


= sine 



(9-2-24) 


(9-2-25) 


This means that the smallest value of T for which transmission with zero ISI is 
possible is T= 1/2 W, and for this value, x(t) has to be a sine function. The 
difficulty with this choice of *(/) is that it is noncausal and therefore 
nonrealizable. To make it realizable, usually a delayed version of it, i.e., 
sine [?r(f - t 0 )/T] is used and t 0 is chosen such that for r<0, we have 
sine [;t(j - t 0 )/ T\ ** 0. Of course, with this choice of x(f), the sampling time 


Plot of B{f) for the case T ~ Ml W. 
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FIGURE 9-2-6 


I X{f\ n/T)i 



x 



IX X. 


Plot of B(f) for the case T > 1/2 W. 


must also be shifted to mT + t 0 . A second difficulty with this pulse shape is that 
its rate of convergence to zero is slow. The tails of x(t) decay as 1 //; 
consequently, a small mistiming error in sampling the output of the matched 
filter at the demodulator results in an infinite series of ISI components. Such a 
series is not absolutely summable because of the 1/r rate of decay of the pulse, 
and, hence, the sum of the resulting ISI does not converge. 

3 When T > 1/2 W, B(f) consists of overlapping replications of X(f) 
separated by 1/7, as shown in Fig. 9-2-6, In this case, there exist numerous 
choices for X '(f) such that B(f) = T. 

A particular pulse spectrum, for the 7>1/2W case, that has desirable 
spectral properties and has been widely used in practice is the raised cosine 
spectrum. The raised cosine frequency characteristic is given as (see Problem 
9-11) 



(9-2-26) 


where (3 is called the rolloff factor, and takes values in the range 0 =£ ss 1. The 
bandwidth occupied by the signal beyond the Nyquist frequency 1/27' is called 
the excess bandwidth and is usually expressed as a percentage of the Nyquist 
frequency. For example, when /3 = ^, the excess bandwidth is 50%, and when 
(3 = 1, the excess bandwidth is 100%. The pulse x(t), having the raised cosine 
spectrum, is 

sin (tzt/T) cos (n(3t!T) 

X{t ~ xt/T 1-4 (3 2 t 2 /T 2 


= sine (7T t/T) 


cos ( n(3t/T ) 


1 -4/3V/T 2 


(9-2-27) 
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FIGURE 9-2-7 



_ I _ j. JL if 

T 2T 2 T T 

lf>l 

Pulses having a raised cosine spectrum. 


Note that x(t) is normalized so that *(0) = 1. Figure 9-2-7 illustrates the raised 
cosine spectral characteristics and the corresponding pulses for /3 = 0, 5 and 1. 
Note that for 0=0, the pulse reduces to ,r(f) = sine (nt/T), and the symbol 
rate 1/7 = 2W. When 0 = 1, the symbol rate is 1/7 = W. In general, the tails 
of x(t) decay as 1/r 3 for 0 >0. Consequently, a mistiming error in sampling 
leads to a series of IS1 components that converges to a finite value. 

Due to the smooth characteristics of the raised cosine spectrum, it is possible 
to design practical filters for the transmitter and the receiver that approximate 
the overall desired frequency response. In the special case where the channel is 
ideal, i.e., C(/) — 1, j/| =£ W, we have 


X rc (f) = G T (f)G ti {f), ' (9-2-28) 

where G T {f) and G R {f) are the frequency responses of the two filters. In this 
case, if the receiver filter is matched to the transmitter filter, we have 
x n (f) = Cr(f)G R (f) = \GAf)\ 2 . Ideally, 

Gr(/) = ^\Xr<{f)\e~ ,2 * f '" (9-2-29) 

and G R (f) = Gj{f), where t 0 is some nominal delay that is required to ensure 
physical realizability of the filter. Thus, the overall raised cosine spectral 
characteristic is split evenly between the transmitting filter and the receiving 
filter. Note also that an additional delay is necessary to ensure the physical 
realizability of the receiving filter. 
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9-2-2 Design of Band-Limited Signals with Controlled ISI — 

Partial-Response Signals 

As we have observed from our discussion of signal design for zero ISI. it ts 
necessary to reduce the symbol rate l IT below the Nyquist rate of 2 W 
symbols/s to realize practical transmitting and receiving filters. On the other 
hand, suppose we choose to relax the condition of zero ISI and, thus, achieve a 
symbol transmission rate of 2W symbols/s. By allowing for a controlled 
amount of ISI, we can achieve this symbol rate. 

We have already seen that the condition for zero ISI is .r (nT) = 0 for n ¥* 0 . 
However, suppose that we design the band-limited signal to have controlled 
ISI at one time instant. This means that we allow one additional nonzero value 
in the samples {x/nT)}. The ISI that we introduce is deterministic or 
‘controlled" and, hence, it can be taken into account at the receiver, as 
discussed below. 

One special case that leads to (approximately), physically realizable 
transmitting and receiving filters is specified by the samples') 


x 



1 

0 


(n = 0 , 1 ) 
(otherwise) 


Now, using (9-2-20), we obtain 


(9-2-30) 


. \T (n = 0, -I) 

" lO (otherwise) (9-2-31) 

which, when substituted into (9-2-18), yields 

B(f) - T + Te ' 2nfT (9-2-32) 

As in the preceding section, it is impossible to satisfy the above equation for 
T < 1/2 W. However, for T = 1/2W. we obtain 


*{f) 


(\+e in,,w ) (|/| < W) 


2W 


(otherwise) 


{ J_ -inf aw 

W 

0 


COS; 


’2 W 


(l/l<W) 

(otherwise) 


Therefore, x(t) is given by 


(9-2-33) 


x(0 = sine (2/r Wt) + sine [2n(Wt - ()] (9-2-34) 

This pulse is called a duobinary signal pulse. It is illustrated along with its 


+It is convenient to deal with samples of x(i) that are normalized to unity for n = 0. I. 
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FIGURE 9-2-H 


FIGURE 9-2-9 




Time domain and frequency domain characteristics of a duobinary signal. 


magnitude spectrum in Fig. 9-2-8. Note that the spectrum decays to zero 
smoothly, which means that physically realizable filters can be designed that 
approximate this spectrum very closely. Thus, a symbol rate of 2 W is achieved. 

Another special case that leads to (approximately) physically realizable 
transmitting and receiving filters is specified by the samples 


x 



- x(nT) 


1 (« = -l) 
-1 (n=l) 

0 (otherwise) 


(9-2-35) 


The corresponding pulse .t(r) is given as 

K{t+T) 


x (/) = sine 


- sine 


[^] 


and its spectrum is 


X(f) 


l 

2 W 
0 


— {e irrf,w - e ~w w ) = sin — -f\*W 

-mi/V f VV 


(9-2-37) 


f\>W 


This pulse and its magnitude spectrum are illustrated in Fig. 9-2-9. It is called a 
modified duobinary signal pulse. It is interesting to note that the spectrum of 


Time domain and frequency domain characteristics of a modified duobirtary signal. 
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this signal has a zero at /- 0, making it suitable for transmission over a 
channel that does not pass d.c. 

One can obtain other interesting and physically realizable filter characteris- 
tics. as shown by Kretzmer (1966) and Lucky et al. (1968). by selecting 
different values for the samples {x(n/2W)} and more than two nonzero 
samples. However, as we select more nonzero samples, the problem of 
unraveling the controlled ISI becomes more cumbersome and impractical. 

In general, the class of bandlimited signals pulses that have the form 


x{,) ~JAtv 'v ) sinc l 2,,w ('-^)] < 9 ' 2 " ,8 > 


and their corresponding spectra 


r i 


*(/) = 


- 2 W 
0 



-jnnftW 


(I/I « HO 
(I f\^W) 


(9-2-39) 


are called partial -response signals when controlled ISI is purposely introduced 
by selecting two or more nonzero samples from the set {x(n/2W)}. The 
resulting signal pulses allow us to transmit information symbols at the Nyquist 
rate of 2 W symbols/s. The detection of the received symbols in the presence of 
controlled ISI is described below. 


Alternative Characterization of PartiaLResponse Signals We conclude 
this subsection by presenting another interpretation of a partial-response 
signal. Suppose that the partial-response signal is generated, as shown in Fig. 
9-2-10, by passing the discrete-time sequence {/„} through a discrete-time filter 


FIGURE 9-2-10 


An alternative method for generating a partial-response signal. 



Output 
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with coefficients x„ = x[n/2W), n = 0, 1 N - 1, and using the output 

sequence {B n } from this filter to excite periodically with an input B„S(t - nT) 
an analog filter having an impulse response sine (2xWt). The resulting output 
signal is identical to the partial-response signal given by (9-2-38). 

Since 

V-l 

B„ = 2 x k I nk (9-2-40) 

k -O 

the sequence of symbols {B,,} is correlated as a consequence of the filtering 
performed on the sequence {/„}. In fact, the autocorrelation function of the 
sequence { B n } is 


<£(m) = E(B„B„ + J 

N - \ N -\ 

= 22 ***,£(W„ +„-/) (9-2-41) 

k =0 f = 0 

When the input sequence is zero-mean and white, 

E{I n „ k i n+m -) = 8 m+k _, (9-2-42) 

where we have used the normalization E(I 2 „)= 1. Substitution of (9-2-42), into 
(9-2-41) yields the desired autocorrelation function for {B„} in the form 

<Km)= 2 * k x k+m . m ~0, ±1, . . . , ±(N - 1) (9-2-43) 

k=0 


The corresponding power spectral density is 

v- 1 

<!>(/) = 2 4>(m)e ,2nfmT 

m = -(W— 1 ) 


N-1 


*rr,e 


-jlnfmT 


(9-2-44) 


where T = 1/2 W and |/|« 1/27" = W. 


9-2-3 Data Detection for Controlled ISI 

In this section, we describe two methods for detecting the information symbols 
at the receiver when the received signal contains controlled ISI. One is a 
symbol-by-symbol detection method that is relatively easy to implement. The 
second method is based on the maximum-likelihood criterion for detecting a 
sequence of symbols. The latter method minimizes the probability of error but 
is a little more complex to implement. In particular, we consider the detection 
of the duobinary and the modified duobinary partial response signals. In both 
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cases, we assume that the desired spectral characteristic X(f) for the partial 
response signal is split evenly between the transmitting and receiving filters, 
i.e., |G r (/)| = |G*(/)| = |A'(/)j" 2 . This treatment is based on PAM signals, but 
it is easily generalized to QAM and PSK. 


Symbol-by-Symbol Suboptimum Detection For the duobinary signal 
pulse. x(nT) =1, for n = 0. 1, and zero otherwise. Hence, the samples at the 
output of the receiving filter (demodulator) have the form 

y», - B n , + K, = L, + L, - 1 + v,„ (9-2-45) 

where {/,„} is the transmitted sequence of amplitudes and {v,„} is a sequence of 
additive gaussian noise samples. Let us ignore the noise for the moment and 
consider the binary case where /„, = ±1 with equal probability. Then B m takes 
on one of three possible values, namely, B m = -2, 0, 2 with corresponding 
probabilities 1/4, 1/2, 1/4. If /„, , is the detected symbol from the {m - 1) th 
signaling interval, its effect on B„„ the received signal in the m th signaling 
interval, can be eliminated by subtraction, thus allowing /„, to be detected. This 
process can be repeated sequentially for every received symbol. 

The major problem with this procedure is that errors arising from the 
additive noise tend to propagate. For example, if is in error, its effect on 
B,„ is not eliminated but, in fact, it is reinforced by the incorrect subtraction. 
Consequently, the detection of B m is also likely to be in error. 

Error propagation can be avoided by precoding the data at the transmitter 
instead of eliminating the controlled IS! by subtraction at the receiver. The 
precoding is performed on the binary data sequence prior to modulation. From 
the data sequence {D n } of Is and Os that is to be transmitted, a new sequence 
{/),}, called the precoded sequence, is generated. For the duobinary signal, the 
precoded sequence is defined as 


= A„ © - 1, m = 1 , 2, . . . (9-2-46) 

where © denotes modulo-2 subtraction.! Then we set /„, = - 1 if P m = 0 and 
_ 1 if P m = 1. i e., I m —2 P m — 1. Note that this precoding operation is 
identical to that described in Section 4-3-2 in the context of our discussion of 
an NRZI signal. 

The noise-free samples at the output of the receiving filter are given by 

Pm ~ An An-1 


Consequently. 


= (2 P m ~ l) + (2/V ,-l) 
= 2 (/>„, + />„_,-!) 


P m + P m -i = kB n + 1 


(9-2-47) 

(9-2-48) 


t Although this is identical to rtiodulo-2 addition, il is convenient to view the precoding 
operation for duobinary in terms of modulo-2 subtraction. 
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TABLE 9-2-1 BINARY SIGNALING WITH DUOBINARY PULSES 


Data 

sequence D„ 
Precoded 

1 

1 

1 0 

1 

0 

0 

1 

0 

0 

0 

1 

1 

0 

1 

sequence P„ 
Transmitted 

0 1 

0 

1 1 

0 

0 

0 

1 

1 

1 

1 

0 

1 

1 

0 

sequence /,„ 
Received 

-1 t 

-1 

1 1 

-1 

-1 


1 

1 

1 

1 

-1 

1 

1 

-1 

sequence B„ 
Decoded 

0 

0 

0 2 

0 

-2 

-2 

0 

2 

2 

2 

0 

0 

2 

0 

sequence D„ 

1 

1 

1 0 

1 

0 

0 

1 

0 

0 

0 

1 

I 

0 

l 


Since D„, = P m © . , , it follows that the data sequence D m is obtained from 

B,„ using the relation 

D m = \B m + 1 (mod 2) (9-2-4 9) 

Consequently, if B,„ = ±2 then D„, = 0, and if B„, = 0 then D„, = 1. An 
example that illustrates the precoding and decoding operations is given in 
Table 9-2-1. In the presence of additive noise, the sampled outputs from the 
receiving filter are given by (9-2-45). In this case y m = B„, + v m is compared 
with the two thresholds set at +1 and -1. The data sequence {D„} is obtained 
according to the detection rule 


£)„, = ( 1 (l ^' l<1) 
l0 (J;yJ* 1) 


(9-2-50) 


The extension from binary PAM to multilevel. PAM signaling using the 
duobinary pulses is straightforward. In this case the A/-Ievel amplitude 
sequence {/ m } results in a (noise-free) sequence 


B,„ =l m + m = I, 2, . . . (9-2-51) 

which has 2M - 1 possible equally spaced levels. The amplitude levels are 
determined from the relation 


l m = 2P„,~{M~\) (9-2-52) 

where {/>„,} is the precoded sequence that is obtained from an M-level data 
sequence {D„,} according to the relation 

Pm = D m QP m - 1 (mod M) (9-2-53) 

where the possible values of the sequence {D m } are 0, 1, 2, . . . , M - 1 

In the absence of noise, the samples at the output of the receiving filter may 
be expressed as 


/ m -, = 2 [P m + 1)] 


(9-2-54) 
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TABLE 9-2-2 FOUR-LEVEL SIGNAL TRANSMISSION WITH DUOBINARY PULSES 


Data 


sequence D,„ 
Precoded 


0 

0 

1 

3 

1 

2 

0 

3 

3 

2 

0 

1 

0 

sequence P„, 
Transmitted 

0 

0 

0 

1 

■> 

3 

3 

1 

2 

1 

1 

3 

2 

2 

sequence /,„ 
Received 

-3 

-3 

-3 

-1 

1 

3 

3 

-1 

1 

-1 

-I 

3 

1 

1 

sequence B,„ 
Decoded 


-6 

-6 

-4 

0 

4 

6 

2 

0 

0 

-2 

2 

4 

2 

sequence D,„ 


(J 

0 

1 

3 

1 

2 

0 

3 

3 

2 

0 

1 

0 


Hence, 

P m +P m --i = \B m + (M-\) (9-2-55) 

Since D n , - P m + P m , (mod M), it follows that 

D„, = hB m + (M - l) (mod M) (9-2-56) 

An example illustrating multilevel precoding and decoding is given in Table 
9-2-2. 

In the presence of noise, the received signal-plus-noise is quantized to the 
nearest of the possible signal levels and the rule given above is used on the 
quantized values to recover the data sequence. 

In the case of the modified duobinary pulse, the controlled 1SI is specified 
by the values x(n/2W) = -\, for n = 1, jc(n/2W) = l for n = - 1, and zero 
otherwise. Consequently, the noise-free sampled output from the receiving 
filter is given as 

B,„ = I,„ — l„,~ 2 (9-2-57) 

where the M -level sequence {/„,} is obtained by mapping a precoded sequence 
according to the relation (9-2-52) and 

P m =D m @P m 2 (mod A/) (9-2-58) 

From these relations, it is easy to show that the detection rule for recovering 
the data sequence {/)„,} from {£„,} in the absence of noise is 

D m = \B„ (mod M) (9-2-59) 

As demonstrated above, the precoding of the data at the transmitter makes 
it possible to detect the received data on a svmbol-by-symbol basis without 
having to look back at previously delected symbols. Thus, error propagation is 
avoided. 

The symbol-by-symbol detection rule described above is not the optimum 
detection scheme for partial response signals due to the memory inherent in 
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FIGURE 9-2-11 


Trellis for duobinary partial response signal. 


1/2 1/2 1/2 



the received signal. Nevertheless, symbol-by-symbol detection is relatively 
simple to implement and is used in many practical applications involving 
duobinary and modified duobinary pulse signals. Its performance is evaluated 
in the following section. 

Maximum-Likelihood Sequence Detection It is clear from the above 
discussion that partial-response waveforms are signal waveforms with memory. 
This memory is conveniently represented by a trellis. For example, the trellis 
for the duobinary partial-response signal for binary data transmission is 
illustrated in Fig. 9-2-11. For binary modulation, this trellis contains two states, 
corresponding to the two possible input values of /„„ i.e., /„, = ±1. Each 
branch in the trellis is labeled by two numbers. The first number on the left is 
the new data bit, i.e., / m + [ = ±1. This number determines the transition to the 
new state. The number on the right is the received signal level. 

The duobinary signal has a memory of length L = 1. Hence, for binary 
modulation the trellis has S, = 2 states. In general, for A# -ary modulation, the 
number of trellis states is M L . 

The optimum maximum-likelihood (ML) sequence detector selects the most 
probable path through the trellis upon observing the received data sequence 

{y„,} at the sampling instants t = mT, m = 1,2 In general, each node in the 

trellis will have M incoming paths and M corresponding metrics. One out of 
the M incoming paths is selected as the most probable, based on the values of 
the metrics and the other M - 1 paths and their metrics are discarded. The 
surviving path at each node is then extended to M new paths, one for each of 
the M possible input symbols, and the search process continues. This is 
basically the Viterbi algorithm for performing the trellis search. 

For the class of partial response signals, the received sequence {v,„. NmS 
N} is generally described statistically by the joint pdf /(j\v|L»). where 
y, v = l v, y 2 - >v] 7 and l, v = [/| h 4 ' ' f/v] 7 and N > L. When the additive 

noise is zero-mean gaussian, f(y N I v ) is a multivariate gaussian pdf, i.e.. 

/( y.v | In) = (2 „ de 1 t cr- CXPl ~- (yv “ Bvytr ,(y " " (9 ‘ 2 ' 60) 

where B v — [B , B 2 444 B\ | ' , is the mean of the vector y v and C is the NxN 
covariance matrix of y v Then, the ML sequence detector selects the sequence 
through the trellis that maximizes the pdf /(y, v | 1 a,). 
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The computation for finding the most probable sequence through the trellis 
is simplified by taking the natural logarithms of /( y, v | I N ). Thus, 

In /(y„ I In) = ~hN In {In det C) - |(y„ - B„)'C' '(y* - B„) (9-2-61) 

Given the received sequence {y m }, the data sequence {/„,} that maximizes 
ln/(y„ | I/v) is identical to the sequence {/*,) that minimizes (y* — 
B A ,)'C- , (y /v -B A ,).i.e. 


Iv = arg min [(y* - B V )'C' ‘(y* - B*)] (9-2-62) 

In 

The metric computations in the trellis search are complicated by the 
correlation of the noise samples at the output of the matched filter for the 
partial response signal. For example, in the case of the duobinary signal 
waveform, the correlation of the noise sequence {v TO } is over two successive 
signal samples. Hence, v m and v m+k are correlated for k = 1 and uncorrelated 
for k > 1. In general, a partial response signal waveform with memory L will 
result in a correlated noise sequence at the output of the matched filter, which 
satisfies the condition E[v m v m+k ] ~ 0 for k>L. In such a case, the Viterbi 
algorithm for performing the trellis search may be modified as described in 
Chapter 10. 

Some simplification in the metric computations result if we ignore the noise 
correlation by assuming that E(v m v m+k ) = 0 for A:>0. Then, by assumption, 
the covariance matrix C = all„, where o-\= E^] and 1* is the N x N 
identity matrix.f In this case, (9-2-62) simplifies to 


where 


J/v = arg min [(y„ - B A ,)'(y. v - B„)] 

In 


= arg min 

In 



(9-2-63) 


Em ~ 2 X k^m -k 
k~ 0 


and x k - x(kT ) are the sampled values of the partial response signal waveform. 
In this case, the metric computations at each node of the trellis have the form 

DMM = DA/ m _,(I m _,) + (y m - 2 x k / m _*) 2 (9-2-64) 

where DM m (l m ) are the distance metrics at time t = mT, DM m ,(I m ,) are the 
distance metrics at time t = (m - 1)T and the second term on the right-hand 
side of (9-2-64) represents the new increments to the metrics based on the new 
received- sample y m . 


tWe are using t w h£re to avoid confusion with I*. 



CHAPTER 9 SIGNAL DESIGN FOR BAND LIMITED CHANNELS 557 


As indicated in Section 5-1-4, ML sequence detection introduces a variable 
delay in detecting each transmitted information symbol. In practice, the 
variable delay is avoided by truncating the surviving sequences to N, most 
recent symbols, where N, » 5 L, thus achieving a fixed delay. In the case that 
the M l surviving sequences at time t - mT disagree on the symbol l m -n t , the 
symbol in the most probable surviving sequence may be chosen. The loss in 
performance resulting from this truncation is negligible if N, > 5 L. 


9-2-4 Signal Design for Channels with Distortion 

In Sections 9-2-1 and 9-2-2, we described signal design criteria for the 
modulation filter at the transmitter and the demodulation filter at the receiver 
when the channel is ideal. In this section, we perform the signal design under 
the condition that the channel distorts the transmitted signal. We assume that 
the channel frequency response C(/) is known for \f \ « W and that C(f) = 0 
for |/| > W. The criterion for the optimization of the filter responses G T (f) and 
Gji(f) is the maximization of the SNR at the output of the demodulation filter 
or equivalently, at the input to the detector. The additive channel noise is 
assumed to be gaussian with power spectral density <!>„„(/). Figure 9-2-12 
illustrates the overall system under consideration. 

For the signal component at the output of the demodulator, we must satisfy 
the condition 

Gr(/)C(/)C«(/) = X d (f)e~ i2 * fto , \f\ ^ W (9-2-65) 

where X d {f ) is the desired frequency response of the cascade of the 
modulator, channel, and demodulator, and t 0 is a time delay that is necessary 
to ensure the physical realizability of the modulation and demodulation filters. 
The desired frequency response X d (f) may be selected to yield either zero ISI 
or controlled ISI at the sampling instants. We shall carry out the optimization 
for zero ISI by selecting X d (f) = X rc (f), where X rc (f ) is the raised cosine 
spectrum with an arbitrary rolloff factor. 

The noise at the output of the demodulation filter may be expressed as 

v(/)=[ n(t- r)g R (T)dT (9-2-66) 


FIGURE 9-2-12 System model for the design of the modulation and demodulation filters. 



Gaussian 

noise 
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where n(t ) is the input to the filter. Since n(r) is zero-mean gaussian, v(r) is 
zero-mean gaussian, with a power spectral density 

<M/) = *„„(/) \G*(f)\ 2 (9-2-67) 

For simplicity, we consider binary PAM transmission. Then, the sampled 
output of the matched filter is 


y m = + v m = I m + v m (9-2-68) 

where x 0 is normalized! to unity, l m — ±d, and v m represents the noise term, 
which is zero-mean gaussian' with variance 


o-l= f <£„„(/) \G R (J)\ 2 df 

J — oc 

Consequently, the probability of error is 

ft -vbL'" w ' y - e(Wj 


(9-2-69) 


(9-2-70) 


The probability of error is minimized by maximizing the SNR = d z /ai, or, 
equivalently, by minimizing the noise -to-signal ratio a 2 Jd 2 . But d 2 is related to 
the transmitted signal power as follows: 


= g 2 A0dt 

ii r 

7^wU GAf)ldf 


(9-2-71) 


However, G T (f) 
Consequently, 


must be chosen to satisfy the zero ISI condition. 

IOr</)! 'ran^k- v '* w (M - 72) 


and Grif) = 0 for |/| ^ W. Hence 


i. j_ r _iXc(/)i 2 

d 2 P„T)- w \C(J)\ 2 \G R (f)\ 1 7 


(9-2-73) 


Therefore, the noise-to-signal ratio that must be minimized with respect to 
|G r (/)| for I/I ^ W is 


’ll 

d 2 


i —\ * m <j) \c R {f)\ 2 df r 

iv / J-W J- u 


\Xrcift 


w\C(f)\ 2 \G R (f)\ 


df (9-2-74) 


tBy setting x„- 1 and I m - ±d, the scaling by x„ is incorporated into the parameter d. 
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The optimum |G/f(/)| can be found by applying the Cauchy -Schwartz 
inequality, 


[ wm 2 df\ mftdf^W iw)nw)id/] 2 (9-2-75) 

j— OC j — L J-5C 


where |C,(/)| and \U 2 (f)\ are defined as 

I^.(/)I = |V^(7)I!G r (/)| 
\XM)\ 


IW)I = 


\C(f)\\G R (f)\ 


(9-2-76) 


The minimum value of (9-2-74) is obtained when |t/,(/)| is proportional to 
\U 2 (f)\, or, equivalently, when 


lc " (,)l= *id&' 1/1 <H ' (9 - 2 - 77) 

where K is an arbitrary constant. The corresponding modulation filter has a 
magnitude characteristic 


|Cr(/)l = 


1 \XM)\ m l*nn(f)r 
K \C(f)\'* 


l/l*w 


(9-2-78) 


Finally, the maximum SNR achieved by these optimum transmitting and 
receiving filters is 


# = P^T 

oi U ^ \x rc (f)\ [ 4 W/)] l/ 2 1 c(/)r> dfY (9 ' 2 ‘ 79) 

We note that the optimum modulation and demodulation filters are 
specified in magnitude only. The phase characteristics for G T (f) and G K (f ) 
may be selected so as to satisfy the condition in (9-2-65), i.e., 


©r(/) + e c (f) + ©«(/) = 2^ 0 (9-2-80) 

where 0t-(/), 0 c (/), and & K (f) are the phase characteristics of the modulation 
filter, the channel, and the demodulation filter, respectively. 

In the special case where the additive noise at the input to the demodulator 
is white gaussian with spectral density jAfo, the optimum filter characteristics 
specified by (9-2-77) and (9-2-78) reduce to 




ic(/)r ’ 


\f\^w 


(9-2-81) 
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where K x and K 2 are arbitrary scale factors. Note that, in this case, |G /f (/)| is 
the matched filter to \G T (f)\. The corresponding SNR at the detector, given by 
(9-2-79) reduces to 


d 2 _2P„Trr w \XM)1 jr r 2 
al N 0 U-w |C(/)| 1 - 


(9-2-82) 


Example 9-2-1 

Let us determine the optimum transmitting and receiving filters for a binary 
communication system that transmits data at a rate of 4800 bits/s over a 
channel with frequency (magnitude) response 


|C(/)| = 


1 

Vi"+ (//iv) 2 ' 


|/|^W 


(9-2-83) 


where W ~ 4800 Hz. The additive noise is zero-mean, white, gaussian with 
spectral density %N 0 = 10 15 W/ Hz. 

Since W = 1 / T = 4800, we use a signal pulse with a raised cosine 
spectrum and 0 = 1. Thus, 


Xrc(f) = 2T[1 + cos [itT I/O) 


Then, 

|O r (/) = |G*(/)| 



(9-2-84) 


I/I « 4800 

(9-2-85) 


and G r (/)j - |G*(/)| - 0, otherwise. Figure 9-2-13 illustrates the filter 
characteristic G T (f). 

One can now use these optimum filters to determine the amount of 
transmitted energy Sf required to achieve a specified error probability. This 
problem is left as an exercise for the reader. 


\G r <J)\ 



FIGURE 9-2-13 Frequency response of optimum transmitter filter 
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9-3 PROBABILITY OF ERROR IN DETECTION OF PAM 

In this section, we evaluate the performance of the receiver for demodulating 
and detecting an M-ary PAM signal in the presence of additive, white, gaussian 
noise at its input. First, we consider the case in which the transmitter and 
receiver filters G T (f) and G R {f) are designed for zero ISI. Then, we consider 
the case in which G T {f) and G R (f) are designed such that x(f) = g r (r) ★#„(*) 
is either a duobinary signal or a modified duobinary signal. 


9-3-1 


Probability of Error for Detection of PAM with Zero ISI 

In the absence of ISI, the received signal sample at the output of the receiving 
matched filter has the form 


where 


y m = x 0 i m + 

( 9 - 3 - 1 ) 

r w 

\G T (f)\ 2 df=% g 

J-w 

( 9 - 3 - 2 ) 


and v„, is the additive gaussian noise that has zero mean and variance 


<rl=\% g N 0 (9-3-3) 

In general, l m takes one of M possible equally spaced amplitude values with 
equal probability. Given a particular amplitude level, the problem is to 
determine the probability of error. 

The problem of evaluating the probability of error for digital PAM in a 
band-limited, additive white gaussian noise channel, in the absence of ISI, is 
identical to the evaluation of the error probability for M-ary PAM as given in 
Section 5-2. The final result that is obtained from the derivation is 

„ 2(M-1)_/ /2%\ 

"" M 2 (VaJ (9 ' 3 ‘ 4) 


But % g -3£ av /(M 2 - 1), £ av - kg’t.av is the average energy per symbol and g &av 
is the average energy per bit. Hence, 


Pm- 


W-l) ( / 6(log 2 M)4av \ 

M (M 2 - I)JV 0 I 


( 9 - 3 - 5 ) 


This is exactly the form for the probability of error of M-ary PAM derived in 
Section 5-2 (see (5-2-46)). In the treatment of PAM given in this chapter, we 
imposed the additional constraint that the transmitted signal is band-limited to 
the bandwidth allocated for the channel. Consequently, the transmitted signal 
pulses were designed to be band-limited and to have zero ISI. 

In contrast, no bandwidth constraint was imposed on the PAM signals 
considered in Section 5-2. Nevertheless, the receivers (demodulators and 
detectors) in both cases are optimum (matched filters) for the corresponding 
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AWGN 


FIGURE W-l 3lock diagram of modulator and demodulator for partial-response signals. 


transmitted signals. Consequently, no loss in error rate performance results 
from the bandwidth constraint when the signal pulse is designed for zero IS I 
and the channel does not distort the transmitted signal. 


9-3-2 Probability of Error for Detection of Partial-Response 
Signals 

In this section we determine the probability of error for detection of digital 
A/-ary PAM signaling using duobinary and modified duobinary pulses. The 
channel is assumed to be an ideal bandlimited channel with additive white 
gaussian noise. The model for the communication system is shown in Fig. 
9-3-1. 

We consider two types of detectors. The first is the symbol-by-symbol 
detector and the second is the optimum ML sequence detector described in the 
previous section. 

SymboI-by-Symbol Detector At the transmitter, the M-level data se- 
quence {D m } is precoded as described previously. The precoder output is 
mapped into one of M possible amplitude levels. Then the transmitting filter 
with frequency response G r (/) has an output 

DC 

v(0* 2 Lgrit-nT) (9-3-6) 

rt~ ~ '*> 

The partial-response function X(f) is divided equally between the transmitting 
and receiving filters. Hence, the receiving filter is matched to the transmitted 
pulse, and the cascade of the two filters results in the frequency characteristic 

|G r (/)G*(/)| = |*(/)| (9-3-7) 

The matched filter output is sampled at t = nT = n/2W and the samples are fed 
to the decoder. For the duobinary signal, the output of the matched filter at the 
sampling instant may be expressed as 

y m = I m + / m -i + v m = B m + v m (9-3-8) 

where v m is the additive noise component. Similarly, the output of the matched 
filter for the modified duobinary signal is 


ym lm - 2 "F Tn + V m 


(9-3-9) 
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For binary transmission, let I m = ±d, where 2d is the distance between signal 
levels. Then, the corresponding values of B m are (2d, 0, —2d). For M- ary PAM 

signal transmission, where I m ~ ±d, ±3 d ±(M — l)d, the received signal 

levels are B m = 0, ±2 d, ±4 d, ... ±2(M — l)d. Hence, the number of received 
levels is 2M - 1, and the scale factor d is equivalent to x 0 = 

The input transmitted symbols { I m } are assumed to be equally probable. 
Then, for duobinary and modified duobinary signals, it is easily demonstrated 
that, in the absence of noise, the received output levels have a (triangular) 
probability distribution of the form 

P(B = 2md) = ^-~- m =0, ±1, ±2, . .. , ±(M - 1) (9-3-10) 

Af 

where B denotes the noise-free received level and 2d is the distance between 
any two adjacent received signal levels. 

The channel corrupts the signal transmitted through it by the addition of 
■white gaussian noise with zero mean and power spectral density \N 0 . 

We assume that a symbol error occurs whenever the magnitude of the 
additive noise exceeds the distance d. This assumption neglects the rare event 
that a large noise component with magnitude exceeding d may result in a 
received signal level that yields a correct symbol decision. The noise 
component v„, is zero-mean gaussian with variance 

fW 

\G R (f)\ 2 df 

J. w 

= &<,[ lX(f)\df = 2N 0 /x (9-3-11) 

j-w 

for both the duobinary and the modified duobinary signals. Hence, an upper 
bound on the symbol probability of error is 

M —2 

Pm < X P(\y - 2md\ >d \ B ~ 2md)P(B = 2 md) 

m = -(M -2) 

+ 2 P(y + 2(M-l)d>d\B= —2(M - 1 )d)P(B = -2(M - 1)4) 

r M 1 i 

= P(\y\>d\b = Q) 2 2 P(B = 2md)-P(B=0)-P(B = -2(M -l)d) 

= (1 — M~ 2 )P(\y\>d | B =0) (9-3-12) 

But 

P(\y\>d\B=0) = - 7 L- f e~^dx 
"V 2k(t v Jd 


— 2Q(\ r nd T ]2N 0 ) 


(9-3-13) 
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Therefore, the average probability of a symbol error is upper-bounded as 

P M < 2(1 - M^)Q{V^j2N 0 ) (9-3-14) 


The scale factor d in (9-3-14) can be eliminated by expressing it in terms of 
the average power transmitted into the channel. For the Af-ary PAM signal in 
which the transmitted levels are equally probable, the average power at the 
output of the transmitting filter is 

P.-^fKWX** 

l J'W 

Ed 2 ) r w 4 

= J ^ !*(/)! df = — £(£) (9-3-15) 


where E{I 2 m ) is the mean square value of the M signal levels, which is 


Therefore, 


E{I 2 m ) = \d\M z - 1) 

2 3 nP av T 

4 (At 2 - 1) 


(9-3-16) 

(9-3-17) 


By substituting the value of d 2 from (9-3-17) into (9-3-14), we obtain the upper 
bound on the symbol error probability as 


Pm <2 


(-pMV(i)’ 




M 2 '1 Ain 


(9-3-18) 


where is the average energy per transmitted symbol, which can be also 
expressed in terms of the average bit energy as £ av = fcg bav = (log 2 M)% bhy . 

The expression in (9-3-18) for the probability of error of Af-ary PAM holds 
for both duobinary and modified duobinary partial -response signals. If we 
compare this result with the error probability of Af-ary PAM with zero ISI, 
which can be obtained by using a signal pulse with a raised cosine spectrum, we 
note that the performance of partial response duobinary or modified duobinary 
has a loss of (}>r) : , or 2.1 dB. This loss in SNR is due to the fact that the 
detector for the partial response signals makes decisions on a symbol-by 
symbol basis, thus ignoring the inherent memory contained in the received 
signal at the input to the detector. 


Maximum' Likelihood Sequence Detector The ML sequence detector 
searches through the trellis for the most probable transmitted sequence {/ m } as 
previously described in Section 9-2-3. At each stage of the search process the 
detector compares the metrics of paths that merge at each of the nodes and 
selects the path that is most probable at each node. The performance of the 
detector may be evaluated by determining the probability of error events, 
based on a euclidean distance metric, as was done for soft -decision decoding of 
convolutional codes. The general derivation is given in Section 10-1-4. In the 
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case of the duobinary and modified duobinary signals, it is demonstrated that 
the 2.1 dB loss inherent in the suboptimum symbol-by-symbol detector is 
completely recovered by the ML sequence detector. 


9-3-3 Probability of Error for Optimum Signals in a Channel 
with Distortion 

In Section 9-2-4, we derived the filter responses for .the modulation and 
demodulation filters that maximize the SNR at the input to the detector when 
there is channel distortion. When the filters are designed for zero IS1 at the 
sampling instants, the probability of error for M - ary PAM is 

P M = — - ^ G(Vrf 2 /cr 2 v ) (9-3-19) 

M 


The parameter d is related to the average transmitted power as 

= JG T {f)\ 2 df 


(9-3-20) 


and the noise variance is given by (9-2-69). For AWGN, (9-3-19) may be 
expressed as 




6fc, 


-if 

*4) L-'-W' 


M \ df \ 




M V(M 2 -l)Ar 0 U_* |C(/)| 

Finally, we observe that the loss due to channel distortion is 




-w \C(f)\ 

Note that when C(f) = 1 for |/| =£ W, the channel is ideal and 

-w 


[ X rc (f)df= 1 

J -w 


(9-3-21) 


(9-3-22) 


(9-3-23) 


so that no loss is incurred. On the other hand, when there is amplitude 
distortion, |C(/)| < 1 for some range of frequencies in the band |/| « W and, 
hence, there is a loss in SNR incurred, as given by (9-3-22). This loss is 
independent of channel phase distortion, because phase distortion has been 
perfectly compensated, as implied by (9-2-80). The loss given by (9-3-22) is due 
entirely to amplitude distortion and is a measure of the noise enhancement 
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resulting from the receiving filter, which compensates for the channel 
distortion. 

9-4 MODULATION CODES FOR SPECTRUM 
SHAPING 

We have observed that the power spectral density of a digital communication 
signal can be controlled and shaped by selecting the transmitted signal pulse 
g(i) and by introducing correlation through coding, which is used to combat 
channel distortion and noise in transmission. Coding lor spectrum shaping is 
introduced following the channel encoding so that the spectrum of the 
transmitted signal matches the spectral characteristics of a baseband or 
equivalent lowpass channel. 

Codes that are used for spectrum shaping are generally called either 
modulation codes, or line codes, or data translation codes. Such codes generally 
place restrictions on the sequence of bits into the modulator and, thus, 
introduce correlation and, hence, memory into the transmitted signal. It is this 
type of coding that is treated in this section. 

Modulation codes are usually employed in magnetic recording, in optical 
recording, and in digital communications over cable systems to achieve spectral 
shaping and to eliminate or minimize the d.c. content in the transmitted (or 
stored) baseband signal. In magnetic recording channels, the modulation code 
is designed to increase the distance between transitions in the recorded 
waveform and, thus, intersymbol interference effects are also reduced. 

As an example of the use of a modulation code, let us consider a magnetic 
recording system, which consists of the elements shown in the block diagram of 
Fig. 9-4-1. The binary data sequence to be stored is used to generate a write 
current. This current may be viewed as the output for the “modulator.” The 
most commonly used method to map the information sequence into the write 
current waveform is NRZI, which was described in Section 4-3-2. Recall that in 
NRZI, a transition from one amplitude to another (A to -A or -A to A) 
occurs only when the information bit is a 1 . No transition occurs when the 
information bit is a 0, i.e., the amplitude level remains the same as in the 
previous signal interval. The positive amplitude pulse results in magnetizing 


FIGURE 9-4-1 Block diagram of magnetic storage read/w rite system. 
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FIGURE 9-4.2 


FIGURE 9-4-3 


Read-back pulse in magnetic recording system. 



the medium in one (direction) polarity and the negative pulse magnetizes the 
medium in the opposite (direction) polarity. 

Since the input data sequence is basically random with equally probable l s 
and Os. we shall encounter level transitions from A to —A or -A to A with 
probability 1/2 for every data bit. The readback signal for a positive transition 
{-A to A) is a pulse that is well-modeled mathematically as 


*(0 = 


1 

1 + (2t!T^) 2 


(9-4-1) 


where T so is defined as the width of the pulse at its 50% amplitude level, as 
shown in Fig. 9-4-2. Similarly, the readback signal for a negative transition (A 
to -A) is the pulse -g(f). The value of T 5 „ is determined by the characteristics 
of the medium, the read/ write heads, and the distance of the head to the 
medium. 

Now, suppose we write a positive transition followed by a negative 
transition. Let’s vary the time interval between the two transitions, which we 
denote as T b (the bit time interval). Figure 9-4-3 illustrates the readback signal 
pulses, which are obtained by a superposition of p(t) with ~P(t ~ T i>)- The 
parameter A = T 5Q /T h is defined as the normalized density. The closer the bit 
transitions ( T h small), the larger will be the value of the normalized density 
and, hence, the larger will be the packing density. We notice that as A is 



Read-back signal response to a pulse. , //■ 
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FIGURE 9-4-4 


increased, the peak amplitudes of the readback signal are reduced and are also 
shifted in time from the desired time instants. In other words, the pulses 
interfere with one another, thus limiting the density with which we can write. 
This problem serves as a motivation to design modulation codes that take the 
original data sequence and transform (encode) it into another sequence that 
results in a write waveform in which amplitude transitions are spaced farther 
apart. For example, if we use NRZI, the encoded sequence into the modulator 
must contain one or more Os between Is. 

The second problem encountered in magnetic recording is the need to avoid 
(or minimize) having a d.c. content in the modulated signal (the write current) 
due to the frequency response characteristics of the readback system and 
associated electronics. This requirement also arises in digital communication 
over cable channels. This problem can be overcome by altering (encoding) the 
data sequence into the modulator. A class of codes that satisfy these objectives 
are the modulation codes described below. 

Runiength -Limited Codes Codes that have a restriction on the number of 
consecutive Is or Os in a sequence are generally called runiength- limited codes. 
These codes are generally described by two parameters, say d and k, where d 
denotes the minimum number of Os between two Is in a sequence, and k 
denotes the maximum number of Os between two Is in a sequence. When used 
with NRZI modulation, the effect of placing d zeros between successive Is is to 
spread the transitions farther apart, thus reducing the overlap in the channel 
response due to successive transitions and hence reducing the intersymbol 
interference. Setting an upper limit k on the runiength of Os ensures that 
transitions occur frequently enough so that symbol timing information car be 
recovered from the received modulated signal. Runlength-limited codes are 
usually called (d, k ) codes.) 

The (d, k) code sequence constraints may be represented by a finite-state 
sequential machine with k + 1 states, denoted as S,, ! s= i * + 1, as shown in 
Fig. 9-4-4. We observe that an output data bit 0 takes the sequence from state 
S, to S, + i, i k. The output data bit l takes the sequence to state 5^ The 
output bit from the encoder may be a 1 only when the sequence is in state S,, 
d + 1 « i k + I. When the sequence is in state S K + i , the output bit is always 


Finite-state sequential machine for a (d, *)-coded sequence. 

0"M© ©)-M©- -°- K©- 2 -*© 


tin fact, they are usually called (d, k) codes, where k is the maximum runiength of zeros. We 
have substituted the Greek letter kappa k for fc, to avoid confusion with our previous use of k. 
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The finite-stale sequential machine may also be represented by a state 
transition matrix, denoted as D, which is a square (k + 1 ) x (* + 1) with 
elements ti, r where 

4, = 1 (i^rf+l) 

d= (l (; = /' + I) (9-4-2) 

'' it) (otherwise) 


Example 9-4-1 


Let us determine the state transition matrix for a (d, k) = (1, 3) code. The 
(1,3) code has four states. From Fig. 9-4-4, we obtain its state transition 
matrix, which is 


“0 1 0 O' 
10 10 
10 0 1 
_1 0 0 0_ 


(9-4-3) 


An important parameter of any ( d , k ) code is the number of sequences of 
a certain length, say n, that satisfy the ( d , k) constraints. As n is allowed to 
increase, the number of sequences N(n) that satisfy the (d, k) constraint 
also increases. The number of information bits that can be uniquely 
represented with N(n) code sequences is 

k =Llog 2 N(n) J 

where UJ denotes the largest integer contained in x. The maximum code 
rate is then R c - k In. 

The capacity of a {d, k) code is defined as 


C(d, «) = Jim - log 2 N(n) (9-4-4) 

«-** n 

Clearly, C(d, k) is the maximum possible rate that can be achieved with the 
(d, k) constraints. Shannon (1948) showed that the capacity is given as 

C{d, k) = log 2 A max (9-4-5) 

where A max is the largest real eigenvalue of the state transition matrix D. 


Example 9-4-2 


Let us determine the capacity of a (</,«)=* (1,3) code. Using the state- 
transition matrix given in Example 9-4-1 for the (1,3) code, we have 


det (D - AI) = det 


-A 1 0 

1 -A 1 
1 0 -A 

1 0 0 


= A 4 -A 2 -A-1=0 


0 

0 

1 

-A 


(9-4-6) 
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TABLE 9-4-1 CAPACITY C(rf. k) VERSUS RUNLENGTH PARAMETERS d AND a 


K 

d = 0 

^4 

II 

~o 

M 

II 

P'1 

11 

C*. 

It 

d = 5 

ft. 

II 

& 

2 

.8791 

.4057 






3 

.9468 

.5515 

.2878 





4 

.9752 

.6174 

.4057 

.2232 




5 

.9881 

.6509 

.4650 

.3218 

.1823 



6 

.9942 

.6690 

.4979 

.3746 

.2269 

.1542 


7 

.9971 

.6793 

.5174 

.4057 

.3142 

2281 

.1335 

8 

.9986 

.6853 

.5293 

.4251 

.3432 

.2709 

.1993 

9 

.9993 

.6888 

.5369 

.4376 

.3620 

.2979 

.2382 

10 

.9996 

.6909 

.5418 

.4460 

.3746 

.3158 

.2633 

11 

.9998 

.6922 

.5450 

.4516 

.3833 

.3285 

.2804 

12 

.9999 

.6930 

.5471 

.4555 

.3894 

.3369 

.2924 

13 

.9999 

.6935 

.5485 

.4583 

.3937 

.3432 

.3011 

14 

.9999 

.6938 

.5495 

.4602 

.3968 

.3478 

.3074 

15 

.9999 

.6939 

.5501 

.4615 

.3991 

.3513 

.3122 

X 

1.000 

.6942 

.5515 

.4650 

.4057 

.3620 

.3282 


The maximum real root of this polynomial is found to be A max = 1.4656. 

Therefore, the capacity C( 1, 3) = log? A max = 0.5515. 

The capacities of (d, k) codes for 0«d«6 and 2 «k=s 15 are given in 
Table 9-4-1. We observe that C{d, k)< ± for u >3 and any value of k. The 
most commonly used codes for magnetic recording employ d 2; hence, their 
rate R c is at least 

Now let us turn our attention to the construction of some runlength-limited 
codes. In general, (d, k) codes can be constructed either as fixed-length codes 
or as variable-length codes. In a fixed-length code, each bit or block of k bits is 
encoded into a block of n > k bits. 

In principle, the construction of a fixed-length code is straightforward. For a 
given block length n, we may select the subset of the 2" code words that satisfy 
the specified runlength constraints. From this subset, we eliminate code words 
that do not satisfy the runlength constraints when concatenated. Thus, we 
obtain a set of code words that satisfy the constraints and can be used in the 
mapping of the input data bits to the encoder. The encoding and decoding 
operations can be performed by use of a look -up table. 


Example 9-4-3 

Let us construct a d — 0, k = 2 code of length n — 3, and determine its 
efficiency. By listing all the code words, we find that the following five code 
words satisfy the (0, 2) constraint: (0 1 0), (0 1 1), (1 0 1), (1 1 0), (111). We 
may select any four of these code words and use then to encode the pairs of 
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data bits (00,01, 10, 11). Thus, we have a rate kin = 2/3 code that satisfies 
the (0, 2) constraint. 

The fixed-length code in this example is not very efficient. The capacity is 
C(0, 2) = 0.8791, so that this code has an efficiency of 

R c 2/3 „ , 

efficiency = — = = 0.76 

J C(d, k) 0.8791 

Surely, better (0, 2) codes can be constructed by increasing the block length 
n. 


In the following example, we place no restriction on the maximum 
runlength of zeros. 


Example 9-4-4 

Let us construct a d — 1, k — <» code of length n =5. In this case, we are 
placing no constraint on the number of consecutive zeros. To construct the 
code, we select from the set of 32 possible code words those that satisfy the 
d = l constraint. There are eight such code words, which implies that we can 
encode three information bits with each code word. The code is given in 
Table 9-4-2. Note that the first bit of each code word is a 0, whereas the last 
bit may be either 0 or 1. Consequently, the d = 1 constraint is satisfied when 
these code words are concatenated. This code has a rate R c = 3/5. When 
compared with the capacity C(l, ») = 0.6942 obtained from Table 9-4-1, the 
code efficiency is 0.864, which is quite acceptable. 

The code construction method described in the two examples above 
produces fixed-length ( d , k ) codes that are state-independent. By state- 
independent, we mean that fixed-length code words can be concatenated 
without violating the (d, *) constraints. In general, fixed-length state- 
independent (d, k) codes require large block lengths, except in cases such as 
those in the examples above where d is small. Simpler (shorter-length) codes 


TABLE 9-4-2 FIXED LENGTH 4 = 1, « = « CODE 


Input data bits Output coded sequence 


0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

1 

0 

1 

0 

0 

0 

0 

1 

0 

0 

I 

1 

0 

0 

1 

0 

0 

1 

0 

0 

0 

0 

1 

0 

1 

1 

0 

1 

0 

1 

0 

0 

0 

1 

1 

0 

0 

1 

0 

0 

1 

1 

1 

1 

0 

1 

0 

1 

0 
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are generally possible by allowing for state -dependence and for variable length 
code words. Below, we consider codes for which both the input blocks to the 
encoder and the output blocks may have variable length. For the code words to 
be uniquely decodable at the receiver, the variable -length code should satisfy 
the prefix condition, described in Chapter 3. 


Example 9-4-5 

A very simple uniquely decodable variable-length d ~ 0, k - 2 code is 

0-»01 
10 — > 10 
11 11 

The code in the above example has a fixed output block size but a variable 
input block size. In general, both the input and output blocks may be variable. 
The following example illustrates the latter case. 


Example 9-4-6 

Let us construct a (2, 7) variable block size code. The solution to this code 
construction is certainly not unique, nor is it trivial. We picked this example 
because the (2, 7) code has been widely used by IBM in many of its disk 
storage systems. The code is listed in Table 9-4-3. We observe that the input 
data blocks of 2, 3, and 4 bits are mapped into output data blocks of 4, 6, 
and 8 bits, respectively. Hence, the code rate is R c = 1/2. Since this is the 
code rate for all code words, the code is called a fixed-rate code. This code 
has an efficiency of 0.5/0.5174 = 0.966. Note that this code satisfies the prefix 
condition. 


TABLE 9-4-3 CODE BOOK FOR VARIABLE- 
LENGTH (2,7) CODE 


Input data bits Output coded sequence 


1 0 

10 0 0 

1 1 

0-1 0 0 

0 1 1 

000100 

0 2 0 

001000 

0 0 0 

10 0 10 0 

0 0 11 

0 0 1 0 0 1 0 0 

0 0 10 

0 0 0 0 1 0 0 0 
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TABLE 9-4-4 


FIGURE 9-4-5 


FIGURE 9-4-6 


ENCODER FOR 

(1,3) MILLER CODE 

Input data bits 

Output coded sequence 

0 

x 0 

1 

0 1 


Jt = 0, if preceding input bit is I 
x = t , if preceding input bit is 0 


Another code that has been widely used in magnetic recording is the rate 
1/2, {d, k) = (1, 3) code in Table 9-4-4. We observe that when the information 
bit is a 0, the first output bit is 1 if the previous input bit was 0, or a 0 if the 
previous input bit was a 1. When the information bit is a 1, the encoder output 
is 01. Decoding of this code is simple. The first bit of the two-bit block is 
redundant and may be discarded. The second bit is the information bit. This 
code is usually called the Miller code. We observe that this is a state-dependent 
code, which is described by the state diagram shown in Fig. 9-4-5. There are 
two states labeled S, and Sj with transitions as shown in the figure. When the 
encoder is a state S u an input bit 1 results in the encoder staying in state 5, and 
outputs 01. This is denoted as 1/01. If the input bit is a 0, the encoder enters 
state S 2 and outputs 00. This is denoted as 0/00. Similarly, if the encoder is in 
state S 7t an input bit 0 causes no transition and the encoder output is 10. On 
the other hand, if the input bit is a 1, the encoder enters state S', and outputs 
01- Figure 9-4-6 shows the trellis for the Miller code. 

The Mapping of Coded Bits into Signal Waveforms The output sequence 
from a (d, k ) encoder is mapped by the modulator into signal waveforms for 
transmission over the channel. If the binary digit 1 is mapped into a 
rectangular pulse of amplitude A and the binary digit 0 is mapped into a 


State diagrams for d = 1, k = 3 (Miller) code. 


1/01 


0/10 



1/01 
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FIGURE 9-4-7 


rectangular pulse of amplitude -A, the result is a (d, k) coded NRZ 
modulated signal. Note that the duration of the rectangular pulses is 
T c = RjR b = R.Tf,, where R b is the information (bit) rate into the encoder, T h 
is the corresponding (uncoded) bit interval, and R c is the code rate for the 
{d, k) code. 

When the ( d , k) code is a state-independent fixed-length code with code 
rate R c ~k/n, we may consider each n-bit block as generating one signal 
waveform of duration nT c . Thus, we have M = 2* signal waveforms, one for 
each of the 2* possible k- bit data blocks. These coded waveforms have the 
general form given by (4-3-6) and (4-3-38). In this case, there is no dependence 
between the transmission of successive waveforms. 

In contrast to the situation considered above, the modulation signal is no 
longer memoryless when NRZI is used and/or the ( d , k ) code is state- 
dependent. Let us consider the effect of mapping the coded bits into an NRZI 
signal waveform. 

Recall that the state dependence in the NRZI signal is due to the 
differential encoding of the information sequence. The differential encoding is 
a form of precoding, which is described mathematically as 

Pk = rf* ©;?*_, 

where {d*} is the binary sequence into the precoder, {p k } is the output binary 
sequence from the precoder, and © denotes modulo-2 addition. This encoding 
is characterized by the state diagram shown in Fig. 9-4-7(a). Then, the 
sequence {p k } is transmitted by NRZ. Thus, when p k = 1, the modulator 
output is a rectangular pulse of amplitude A, and when p k = 0, the modulator 


State and trellis diagrams for NRZI signal. 
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output is a rectangular pulse of amplitude -A. When the signal waveforms are 
superimposed on the state diagram of Fig. 9-4-7(a), we obtain the correspond- 
ing state diagram shown in Fig. 9-4-7(b). The corresponding trellis is shown in 
Fig. 9-4-7(c). 

When the output of a state-dependent ( d , k) encoder is followed by an 
NRZI modulator, we may simply combine the two-state diagrams into a 
single-state diagram for the (d, k) code with precoding. A similar combination 
can be performed with the corresponding trellises. The following example 
illustrates the approach for the (1,3) Miller code followed by NRZI 
modulation. 


Example 9-4-7 

Let us determine the state diagram of the combined (1,3) Miller code 
followed by the precoding inherent in NRZI modulation. Since the (1,3) 
Miller code has two states and the precoder has two states, the state 
diagram for the combined encoder has four states, which we denote as 
S N ) = (o-,, 5|), (<Ti, J 2 ). (rr 2 , $i), (m 2 , s 2 ), where S M = {o-,, cr 2 } represents 
the two states of the Miller code and S N — (s,, $ 2 } represents the two states 
of the precoder for NRZI. For each data input bit into the Miller encoder, 
we obtain two output bits which are then precoded to yield two precoded 
output bits. The resulting state diagram is shown in Fig. 9-4-8, where the 
first bit denotes the information bit into the Miller encoder and the next two 
bits represent the corresponding output of the precoder. 

The trellis diagram for the Miller precoded sequence may be obtained 
directly from the combined state diagram or from a combination of the trellises 
of the two codes. The result of this combination is the four-state trellis, one 
stage of which is shown in Fig. 9-4-9. 

It is left as an exercise for the reader to show that the four signal waveforms 
obtained by mapping each pair of bits of the Miller-precoded sequence into an 



FIGURE 9-4-8 State diagram of the Miller code followed by the precoder. 
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FIGURE 9-4-9 One stage of trellis diagram for the Miller code followed by the preccfder. 



NRZ signal are biorthogonal and that the resulting modulated signal waveform 
is identical to the delay modulation that was described in Section 4-3-2. 

From the state diagram of a state-dependent runlength-limited code, one 
can obtain the transition probability matrix, as described in Section 4-3-2. 
Then, the power spectral density of the code may be determined, as shown in 
Section 4-4-3. 

9-5 BIBLIOGRAPHICAL NOTES AND REFERENCES 

The pioneering work on signal design for bandwidth-constrained channels was 
done by Nyquist (1928). The use of binary partial response signals was 
originally proposed by Lender (1963), and was later generalized by Kretzmer 
(1966). Other early work on problems dealing with intersymbol interference 
(ISI) and transmitter- and receiver optimization with constraints on ISI was 
done by Gerst and Diamond (1961), Tufts (1965), Smith (1965), and Berger 
and Tufts (1967). “Faster than Nyquist” transmission has been studied by 
Mazo (1975) and Foschini (1984). 

Modulation codes were also first introduced by Shannon (1948). Some of 
the early work on the construction of runlength-limited codes is found in the 
papers by Freiman and Wyner (1964), Gabor (1967), Franaszek (1968, 1969, 
1970), Tang and Bahl (1970), and Jacoby (1977). More recent work is found in 
papers by Adler Coppersmith and Hassner (1983), and Karabed and Siegel 
(1991). The motivation for most of the work on runlength-limited codes was 
provided by applications to magnetic and optical recording. A well-written 
tutorial paper on runlength-limited codes has been published by Immink 
(1990). 


PROBLEMS 


9-1 A channel is said to be distortionless if the response y(t) to an input x(f) is 
Kx(t — 1 0 ), where K and t 0 are constants. Show that if the frequency response of 
the channel is A(f)e > * if \ where A(f) and 8(f) are real, the necessary and 
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sufficient conditions for distortionless transmission are A(f) = K and 8(f) = 
2jft 0 ±nn, n =0,1,2, — 

9-2 The raised-cosine spectral characteristic is given by (9-2-26). 
a Show that the corresponding impulse response is 

sin ( m/T ) cos (jim/T) 

X ' ' nlT 1-40V/7 2 

b Determine the Hilbert transform of *(/) when (3 = 1. 

c Does £(t) possess the desirable properties of x (t) that make it appropriate for 
data transmission? Explain. 

d Determine the envolope of the SSB suppressed-carrier signal generated from 
x(t). 

9-3 a Show that (Poisson sum formula) 

i, g(')h(t-kT)^X(f) = ± 2 //(£)g(/~) 

Hint: Make a Fourier-series expansion of the periodic factor 

* 

2 h(t-kT) 

b Using the result in (a), verify the following versions of the Poisson sum: 

<■» 

X h> '^ kT)= fX < a > 

S h(kT)txv(-j7nkTf)^\ 2 //(/-£) (ii.) 

c Derive the condition for no intersymbol interference (Nyquist criterion) by 
using the Poisson sum formula. 

9-4 Suppose a digital communications system employs gaussian-shaped pulses of the 
form 

x(t) - exp (-rrnV) 

To reduce the level of intersymbol interference to a relatively small amount, we 
impose the condition that *(7) = 0.01, where 7 is the symbol interval. The 
bandwidth W of the pulse x(f) is defined as that value of W for which 

Af(W)/A'(0) = 0.01, where X(f) is the Fourier transform of jr(f). Determine the 

value of W and compare this value to that of raised-cosine spectrum with 100% 
rolloff. 

9-5 A band-limited signal having bandwidth IV can be represented as 

V r sin [2nW(t-n/2W)] 

’ " 27iW(t~n/2W) 

a Determine the spectrum X(f) and plot \X(f)\ for the following cases: 

*o = 2, *1 = 1, x 2 = -1, x„=0, n¥=0, 1,2 (i) 

*o = 2. *, = -1, *„ = 0, n * —1,0,1 (ii) 



578 DIGITAL COMMUNICATIONS 


b Plot jr(f) for these two cases. 

r If these signals are used for binary signal transmission, determine the number of 
received levels possible at the sampling instants t=nT~n/2W, and the 
probabilities of occurrence of the received levels. Assume that the binary digits 
at the transmitter are equally probable. 

9-6 A 4 kHz bandpass channel is to be used for transmission of data at a rate of 
9600bits/s. If 51% = 10 w W/Hz is the spectral density of the additive, zero-mean 
gaussian noise in the channel, design a QAM modulation and determine the 
average power that achieves a bit error probability of 10 6 . Use a signal pulse with 
a raised-cosine spectrum having a roll-off factor of at least 50%. 

9-7 Determine the bit rate that can be transmitted through a 4 kHz voice-band 
telephone (bandpass) channel if the following modulation methods are used: (a) 
binary PAM; (b) four-phase PSK; (c) 8-point QAM; (d) binary orthogonal FSK, 
with noncoherent detection; (e) orthogonal four-FSK with noncoherent detection; 
(f) orthogonal 8-FSK with noncoherent detection. For (a)-(c), assume that the 
transmitter pulse shape has a raised-cosine spectrum with a 50% roll-ofi. 

9-8 An ideal voice-band telephone line channel has a bandpass frequency response 
characteristic spanning the frequency range 600-3000 Hz. 

a Design an M = 4 PSK (quadrature PSK or QPSK) system for transmitting data 
at a rate of 2400 bits/s and a carrier frequency f ~ 1800 Hz. For spectral 
shaping, use a raised-cosine frequency-response characteristic. Sketch a block 
diagram of the system and describe the functional operation of each block, 
b Repeat (a) for a bit rate R = 4800 bits/s. 

9-9 A voice-band telephone channel passes the frequencies in the band from 300 to 
3300 Hz. It is desired to design a modem that transmits at a symbol rate of 2400 
symbols/s, with the objective of achieving 9600 bits/s. Select an appropriate QAM 
signal constellation, carrier frequency, and the roll-off factor of a pulse with a 
raised cosine spectrum that utilizes the entire frequency band. Sketch the spectrum 
of the transmitted signal pulse and indicate the important frequencies. 

9-10 A communication system for a voice-band (3 kHz) channel is designed for a 
received SNR at the detector of 30 dB when the transmitter power is P s = 
—3 dBW. Determine the value of P s if it is desired to expand the bandwidth of the 
system to 10 kHz, while maintaining the same SNR at the detector. 

9-11 Show that a pulse having the raised cosine spectrum given by (9-2-26) satisfies the 
Nyquist criterion given by (9-2-13) for any value of the roll-off factor f3. 

9-12 Show that, for any value of /3, the raised cosine spectrum given by (9-2-26) satisfies 

f x Jf) df = 1 

« - a 

[Hint: Use the fact that X n {f) satisfies the Nyquist criterion given by (9-2-13).] 

9-13 The Nyquist criterion gives the necessary and sufficient condition for the spectrum 
X(f) of the pulse *(/) that yields zero ISI. Prove that for any pulse that is 
band-limited to f/|< I IT, the zero-ISI condition is satisfied if Re [A '(/)], for/>0, 
consists of a rectangular function plus an arbitrary odd function around / = 1/2 T, 
and Im [X{f)\ is any arbitrary even function around / = 1/2 T. 

9-14 A voice-band telephone channel has a passband characteristic in the frequency 
range 300 Hz </ < 3000 Hz. 

a Select a symbol rate and a power efficient constellation size to achieve 
9600 bits/s signal transmission. 



FIGURE P9-16 


FIGURE P9-17 
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b If a square-root raised cosine pulse is used for the transmitter pulse g(f), select 
the roll-off factor. Assume that the channel has an ideal frequency response 
characteristic. 

9-15 Design an M - ary PAM system that transmits digital information over an ideal 
channel with bandwidth W = 2400 Hz. The bit rate is 14 400 bit/s. Specify the 
number of transmitted points, the number of received signal points using a 
duobinary signal pulse, and the required % to achieve an error probability of 10 6 . 
The additive noise is zero-mean gaussian with a power spectral density 
10 4 W/Hz. 

9-16 A binary PAM signal is generated by exciting a raised cosine roll-off filter with a 
50% roll-off factor and is then DSB-SC amplitude-modulated on a sinusoidal 
carrier as illustrated in Fig. P9-16. The bit rate is 2400 bit/s. 
a Determine the spectrum of the modulated binary PAM signal and sketch it. 
b Draw the block diagram illustrating the optimum demodulator/detector for the 
received signal, which is equal to the transmitted signal plus additive white 
gaussian noise. 

9-17 The elements of the sequence {a,,}*- * are independent binary random variables 
taking values of ± 1 with equal probability. This data sequence is used to modulate 
the basic pulse g(t) shown in Fig. P9-17(a). The modulated signal is 

X(t) = 1 a„g(t~nT) 

n - * 

a Find the power spectral density of A'(r). 

b If g,(t) (shown in Fig. 9-17 b) is used instead of g(f), how would the power 
spectrum in (a) change? 

c In (b) assume we want to have a null in the spectrum at / = 1/37". This is done 
by a precoding of the form b n = a„ + aa,,.,. Find the a that provides the desired 
null. 

d Is it possible to employ a precoding of the form b„ = a„ + 2, v i a,a„ , for some 
finite N such that the final power spectrum will be identical to zero for 
1/3T =£|/| 1/27"? If yes, how? If no, why? [Hint; Use properties of analytic 

functions.] 
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FIGURE P9-22 
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9-18 Consider the transmission of data via PAM over a voice-band telephone channel 
that has a bandwidth of 3000 Hz. Show how the symbol rate varies as a function of 
the excess bandwidth. In particular, determine the symbol rate for an excess 
bandwidth of 25%, 33%, 50%, 67%, 75%, and 100%. 

9-19 The binary sequence 10010110010 is the input to a precoder whose output is used 
to modulate a duobinary transmitting filter. Construct a table as in Table 9-2-1 
showing the precoded sequence, the transmitted amplitude levels, the received 
signal levels and the decoded sequence. 

9-20 Repeat Problem 9-19 for a modified duobinary signal pulse. 

9-21 A precoder for a partial response signal fails to work if the desired partial 
response at n ~ 0 is zero modulo M. For example, consider the desired response 
for M = 2: 


x{nT) = \ 


2 

1 

-1 

0 


(«= 0 ) 

(« = D 
(«= 2 ) 
(otherwise) 


Show why this response cannot be precoded. 

9-22 Consider the RC lowpass filter shown in Fig. P9-22, where r = RC = 10 ~ 6 . 

a Determine and sketch the envelope (group) delay of the filter as a function of 
frequency. 

b Suppose that the input to the filter is a lowpass signal of bandwidth A/ = 1 kHz. 
Determine the effect of the RC filter on this signal. 

9-23 A microwave radio channel has a frequency response 


C(f) = 1+0.3 cos 2nfT 

Determine the frequency response characteristic of the optimum transmitting and 
receiving filters that yield zero ISI at a rate of 1/T symbols/s and have a 50% 
excess bandwidth. Assume that the additive noise spectrum is flat. 

9-24 M ~ 4 PAM modulation is used for transmitting at a bit rate of 9600 bit/s on a 
channel having a frequency response 


1 + /(//2400) 

for (/I «2400, and C(/) = 0 otherwise. The additive noise is zero-mean, white 
Gaussian with power spectral density ^N 0 W/Hz. Determine the (magnitude) 
frequency response characteristic of the optimum transmitting and receiving filters. 
^'25 Determine the capacity of a (0, 1 ) runlength-limited code. Compare its capacity 
with that of a (1, oo) code and explain the relationship. 

9-26 A ternary signal format is designed for a channel that does not pass d.c. The 
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FIGURE P9-31 


FIGURE P9-32 



binary input information sequence is transmitted by mapping a 1 into either a 
positive pulse or a negative pulse, and a zero is transmitted by the absence of a 
pulse. Hence, for the transmission of Is, the polarity of the pulses alternate. This is 
called an AMI (alternate mark inversion) code. Determine the capacity of the 
code. 

9-27 Give an alternative description of the AMI code described in Problem 9-26 using 
the running digit sum (RDS) with the constraint that the RDS can take only the 
values 0 and +1. 

9-28 (kBnT codes) From Problem 9-26, note that the AMI code is a "pseudo-ternary " 
code in that it transmits one bit per symbol using a ternary alphabet, which has the 
capacity of log, 3 = 1.58 bits. Such a code does not provide sufficient spectral 
shaping. Better spectral shaping is achieved by the class of block codes designated 
as kBnT, where k denotes the number of information bits and n denotes the 
number of ternary symbols per block. By selecting the largest k possible for each 
n, we obtain the following table: 


k 

n 

Code 

1 

i 

1B1T 

3 

2 

3B2T 

4 

3 

4B3T 

6 

4 

6B4T 


Determine the efficiency of these codes by computing the ratio of the code in 
bits/symbol divided by log, 3. Note that 1B1T is the AMI code. 

9-29 This problem deals with the capacity of two (d, k) codes. 

a Determine the capacity of a ( d , k) code that has the following state transition 
matrix: 



b Repeat (a) for 



c Comment on the differences between (a) and (b). 

9-30 A simplified model of the telegraph code consists of two symbols (Blahut, 1990). 
A dot consists of one time unit of line closure followed by one time unit of line 




582 


DIGITAL COMMUNICATIONS 


open. A dash consists of three units of line closure followed by one time unit of 
line open. 

a Viewing this code as a constrained code with symbols of equal duration, give the 
constraints. 

b Determine the state-transition matrix, 
c Determine the capacity. 

9-31 Determine the state-transition matrix for the runlength-consfrained code described 
by the state diagram shown in Fig. P9-31. Sketch the corresponding trellis. 

9-32 Determine the state-transition matrix for the (2, 7) runlength-limited code 
specified by the state diagram shown in Fig. P9-32. 



10 


COMMUNICATION 
THROUGH BAND-LIMITED 
LINEAR FILTER CHANNELS 


In Chapter 9, we focused on the design of the modulator and demodulator 
filters for band-limited channels. The design procedure was based on the 
assumption that the (ideal or non-ideal) channel response characteristic C(f) 
was known a priori. However, in practical digital communications systems that 
are designed to transmit at high speed through band-limited channels, the 
frequency response C(f) of the channel is not known with sufficient precision 
to design optimum filters for the modulator and demodulator. For example, in 
digital communication over the dial-up telephone network, the communication 
channel will be different every time we dial a number, because the channel 
route will be different. This is an example of a channel whose characteristics 
are unknown a priori. There are other types of channels, e.g.. wireless channels 
such as radio channels and underwater acoustic channels, whose frequency 
response characteristics are time-variant. For such channels, it is not possible 
to design optimum fixed demodulation filters. 

In this chapter, we consider the problem of receiver design in the presence 
of channel distortion, which is not known a priori, and AWGN. The channel 
distortion results in intersymbol interference, which, if left uncompensated, 
causes high error rates. The solution to the ISI problem is to design a receiver 
that employs a means for compensating or reducing the ISI in the received 
signal. The compensator for the ISI is called an equalizer. 

Three types of equalization methods are treated in this chapter. One is 
based on the maximum-likelihood (ML) sequence detection criterion, which is 
optimum from a probability of error viewpoint. A second equalization method 
is based on the use of a linear filter with adjustable coefficients. The third 
equalization method that is described exploits the use of previous detected 
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symbols to suppress the ISI in the present symbol being detected, and it is 
called decision-feedback equalization. We begin with the derivation of the 
optimum detector for channels with ISI. 


10-1 OPTIMUM RECEIVER FOR CHANNELS WITH 
ISI AND AWGN 

In this section, we derive the structure of the optimum demodulator and 
detector for digital transmission through a nonideal, band-limited channel with 
additive gaussian noise. We begin with the transmitted (equivalent lowpass) 
signal given by (9-2-1). The received (equivalent lowpass) signal is expressed as 

n(0 = ^l n h(t-nT) + z(t) (10-1-1) 

n 

where h(t) represents the response of the channel to the input signal pulse g(t) 
and z(t) represents the additive white gaussian noise. 

First we demonstrate that the optimum demodulator can be realized as a 
filter matched to h(t), followed by a sampler operating at the symbol rate l/T 
and a subsequent processing algorithm for estimating the information sequence 
{/„} from the sample values. Consequently, the samples at the output of the 
matched filter are sufficient for the estimation of the sequence {/„}. 


10-1-1 Optimum Maximum- Likelihood Receiver 

Let us expand the received signal r,(t) in the series 

N 

r,(t)= lim X r k f k (t) (10-1-2) 

where {/*(?)} is a complete set of orthonormal functions and {r A } are the 
observable random variables obtained by projecting r,{t) onto the set {/*(*)}. It 
is easily shown that 


r* = 24Ak,. + z k , * = 1,2 (10-1-3) 

n 

where h kn is the value obtained from projecting h(t - nT) onto f k (t), and z k is 
the value obtained from projecting z(/) onto f k (t). The sequence {z*} is 
gaussian with zero mean and covariance 

2 E(ztz m ) = N 0 S km (10-1-4) 

The joint probability density function of the random variables 
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I-*, = [r, r 2 ... r„] conditioned on the transmitted sequence \ p « [/, t 2 ... l p \, 
where p == N, is 




In the limit as the number N of observable random variables approaches 
infinity, the logarithm of p(j N 3I P ) is proportional to the metrics PM(l p ), 
defined as 

PM(l p ) = - [ |#>(0 - 2 l„h(t ~ « T) 1 : 2 dt 

J i n I 

= -f |r,(0l 2 dr + 2ReSk* 

n L 

- E E I h*(t - nT)h(t -mT)dt (10-1-6) 

n m •'-« 


£ 


r,(r)/»*(r-nr)dt 


The maximum-likelihood estimates of the symbols /,, l 2 , . . . , l p are those that 
maximize this quantity. Note, however, that the integral of |r,(/)| 2 is common to 
all metrics, and, hence, it may be discarded. The other integral involving r(f) 
gives rise to the variables 

/'OC 

y n = y(nT) — I r,(t)h*(t - nT) dt (10-1-7) 


These variables can be generated by passing r(t) through a filter matched to 
h(t) and sampling the output at the symbol rate 1 IT. The samples {y n } form a 
set of sufficient statistics for the computation of PM (I p ) or, equivalently, of the 
correlation metrics 

CM( I,) = 2Re (S 0„) -EE m m x n - m (10-1-8) 

' n ' n m 

where, by definition, x(/) is the response of the matched filter to h(t) and 

x„=x{nT)=[ h*(t)h{t + nT)dt (10-1-9) 

' — oo 

Hence, x{t ) represents the output of a filter having an impulse response h*(-t) 
and an excitation h(t). In other words, jc ( i) represents the autocorrelation 
function of h(t). Consequently, {*„} represents the samples of the autocorrela- 
tion function of h(t), taken periodically at 1/71 We are not particularly 
concerned with the noncausal characteristic of the filter matched to h{t), since, 
in practice, we can introduce a sufficiently large delay to ensure causality of the 
matched filter. 

If we substitute for r,(t) in (10-1-7) using (10-1-1), we obtain 

yk = E 4 ,x k -„ + v* 


( 10 - 1 - 10 ) 
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where v* denotes the additive noise sequence of the output of the matched 
filter, i.e., 

v* = [ z{t)h*(t - kT) dt (10-1-11) 


The output of the demodulator (matched filter) at the sampling instants is 
corrupted by ISI as indicated by (10-1-10). In any practical system, it is 
reasonable to assume that the ISI affects a finite number of symbols. Hence, 
we may assume that x n = 0 for \n \ > L. Consequently, the ISI observed at the 
output of the demodulator may be viewed as the output of a finite state 
machine. This implies that the channel output with ISI may be represented by 
a trellis diagram, and the maximum-likelihood estimate of the information 
sequence (/,, I 2> . . . , l p ) is simply the most probable path through the trellis 
given the received demodulator output sequence {y„}. Clearly, the Viterbi 
algorithm provides an efficient means for performing the trellis search. 

The metrics that are computed for the MLSE of the sequence {/*} are given 
by (10-1-8). It can be seen that these metrics can be computed recursively in 
the Viterbi algorithm, according to the relation 


CM„(I„) = CA/„_,(I„. 1 ) + Re \n(ly n -x 0 I n -7 £ x m I„. m ) 

tn = \ ' 


( 10 - 1 - 12 ) 


Figure 10-1-1 illustrates the block diagram of the optimum receiver for an 
AWGN channel with ISI. 


10-1-2 A Discrete-Time Model for a Channel with ISI 

In dealing with band-limited channels that result in ISI, it is convenient to 
develop an equivalent discrete-time model for the analog (continuous-time) 
system. Since the transmitter sends discrete-time symbols at a rate 
1 U symbols/s and the sampled output of the matched filter at the receiver is 
also a discrete-time signal with samples occurring at a rate 1/7 per second, it 
follows that the cascade of the analog filter at the transmitter with impulse 
response g(r), the channel with impulse response c(t), the matched filter at the 
receiver with impulse response h*(— t), and the sampler can be represented by 


FIGURE 10-1-1 Optimum receiver for an AWGN channel with ISI. 



Clock 
l = kT 
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FIGURE 10-1-2 



I I 

Equivalent discrete-time model of channel with intersymbol interference. 


an equivalent discrete-time transversal filter having tap gain coefficients {**}. 
Consequently, we have an equivalent discrete-time transversal filter that spans 
a time interval of 2LT seconds. Its input is the sequence of information 
symbols {/*} and its output is the discrete-time sequence {y*} given by 
(10-1-10). The equivalent discrete-time model is shown in Fig. 10-1-2. 

The major difficulty with this discrete-time model occurs in the evaluation of 
performance of the various equalization or estimation techniques that are 
discussed in the following sections. The difficulty is caused by the correlations 
in the noise sequence {v*} at the output of the matched filter. That is, the set of 
noise variables {v t } is a gaussian-distributed sequence with zero mean and 
autocorrelation function (see Problem 10-5) 


i£(v* V/ ) 


No x k -j (| k-j\^L) 
0 (otherwise) 


(10-1-13) 


Hence, the noise sequence is correlated unless x k =0, 0. Since it is more 

convenient to deal with the white noise sequence when calculating the error 
rate performance, it is desirable' to whiten the noise sequence by further 
filtering the sequence {y t }. A discrete-time noise-whitening filter is determined 
as follows. 

Let X(z) denote the (two-sided) z transform of the sampled autocorrelation 
function {x k }, i.e., 

L 

X(z) = S *kZ~ k (10-1-14) 

Since x k =xt k , it follows that X(z) = A^z -1 ) and the 2 L roots of X(z) have 
the symmetry that if p is a root, 1/p* is also a root. Hence, X(z) can be 
factored and expressed as 


*(z) = F(z)F*(z- ] ) 


(10-1-15) 







588 DIGITAL COMMUNICATIONS 


FIGURE 16*1-3 


where F(z) is a polynomial of degree L having the roots p\, p 2 , ■ ■ ■ < Pl and 
F*(z~ ') is a polynomial of degree L having the roots 1/p*, l/p*> ■ • ■ . 1 /p*. 
Then an appropriate noise-whitening filter has a z transform 1/F*(z -1 ). Since 
there are 2 L possible choices for the roots of F*(z~ ! )> each choice resulting in 
a filter characteristic that is identical in magnitude but different in phase from 
other choices of the roots, we propose to choose the unique F*(z~ ') having 
minimum phase, i.e., the polynomial having all its roots inside the unit circle. 
Thus, when all the roots of F*{z~ ‘) are inside the unit circle, 1 !F*(z~ x ) is a 
physically realizable, stable, recursive discrete-time filter.f Consequently, 
passage of the sequence {y*} through the digital filter 1 /F*(z~ l ) results in an 
output sequence {v*} that can be expressed as 

L 

V k ='Ef„I k -n+V* (10-1-16) 

where { 17 *} is a white gaussian noise sequence and {/*.} is a set of tap 
coefficients of an equivalent discrete-time transversal filter having a transfer 
function F(z). In general, the sequence {v*} is complex-valued. 

In summary, the cascade of the transmitting filter g(t), the channel c(t), the 
matched filter the sampler, and the discrete-time noise-whitening filter 

1 /F*(z~ l ) can be represented as an equivalent discrete-time transversal filter 
having the set {/*} as its tap coefficients. The additive noise sequence {tj*} 
corrupting the output of the discrete-time transversal filter is a white gaussian 
noise sequence having zero mean and variance N 0 . Figure 10-1*3 illustrates the 
model of the equivalent discrete-time system with white noise. We refer to this 
model as the equivalent discrete -time white noise filter model. 


Equivalent discrete-time model of intersymbol interference channel with WGN. 



(n,l 


*By removing the stability condition, we can also show F*(z ') to have roots on the unit circle. 
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Example 10-1-1 


Suppose that the transmitter signal pulse g(r) has duration T and unit 
energy and the received signal pulse is MO = g ( 0 + ag(t - T). Let us 
determine the equivalent discrete-time white-noise filter model. The sample 
autocorrelation function is given by 




a* 

1 -l- |a| 2 
a 


The z transform of x k is 

X(z)~ t x k z' k 


(k = ~ 1 ) 
(k = 0) 

(* = 1 ) 


= a*z + (1 + |a| 2 ) + az 1 
= ( az + 1 )(a*z + 1) 


(10-1-17) 


(10-1-18) 


Under the assumption that \a\ > 1, one chooses F(z) = az~' + 1, so that the 
equivalent transversal filter consists of two taps having tap gain coefficients 
fo - 1, /i = a. Note that the correlation sequence {jc fc } may be expressed in 
terms of the {/,} as 


L-k 

*k = £ Kfn+k, k =0,1,2, ... t L (10-1-19) 

n=Q 

When the channel impulse response is changing slowly with time, the 
matched filter at the receiver becomes a time-variable filter. In this case, the 
time variations of the channel/matched-filter pair result in a discrete-time filter 
with time-variable coefficients. As a consequence, we have time-variable 
intersymbo! interference effects, which can be modeled by the filter illustrated 
in Fig. 10-1-3, where the tap coefficients are slowly varying with time. 

The discrete-time white noise linear filter model for the intersymbol 
interference effects that arise in high-speed digital transmission over nonideal 
band-limited channels will be used throughout the remainder of this chapter in 
our discussion of compensation techniques for the interference. In general, the 
compensation methods are called equalization techniques or equalization 
algorithms. 


10-1-3 The Viterfoi Algorithm for the Discrete-Time White 
Noise Filter Model 

MLSE of the information sequence {/*} is most easily described in terms of the 
received sequence {u*} at the output of the whitening filter. In the presence of 
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intersymbol interference that spans L + 1 symbols (4 interfering components), 
the MLSE criterion is equivalent to the problem of estimating the state of a 
discrete-time finite-state machine. The finite-state machine in this case is the 
equivalent discrete-time channel with coefficients {/*}, and its state at any 
instant in time is given by the L most recent inputs, i.e., the state at time k is 

5* =</*_„ I k - 2 ,...,I k - L ) (10-1-20) 

where 4 = 0 for k 0. Hence, if the information symbols are A/-ary, the 
channel filter has M L states. Consequently, the channel is described by an 
A/^-state trellis and the Viterbi algorithm may be used to determine the most 
probable path through the trellis. 

The metrics used in the trellis search are akin to the metrics used in 
soft-decision decoding of convolutional codes, in brief, we begin with the 
samples u,, v 2 , . . . , u L4 ,, from which we compute the M L + l metrics 

L + l 

2 In />(v* j4,4-i 4 r.) (10-1-21) 

The M L+ 1 possible sequences of 4+t, I L , . . . , I 2 , 4 are subdivided into M L 
groups corresponding to the M L states (4 + t, 4, . . . , I 2 ). Note that the M 
sequences in each group (state) differ in I x and correspond to the paths through 
the trellis that merge at a single node. From the M sequences in each of the 
M L states, we select the sequence with the largest probability (with respect to 
/,) and assign to the surviving sequence the metric 

PM,(4 + 1 ) = PA4(4+i,4,...,4) 

L+l 

= maxX lnp(t>* |4, 4-i 4 -l) ( 10-1-22) 

>\ k= 1 

The M - 1 remaining sequences from each of the M L groups are discarded. 
Thus, we are left with M L surviving sequences and their metrics. 

Upon reception of v L+2 , the M L surviving sequences are extended by one 
stage, and the corresponding M L * 1 probabilities for the extended sequences 
are computed using the previous metrics and the new increment, which is 
In p(v L+2 | 4+2. 4+i, , 4). Again, the M L+X sequences are subdivided into 

M l groups corresponding to the M L possible states (4+2 h) and the most 

probable sequence from each group is selected, while the other M - 1 
sequences are discarded. 

The procedure described continues with the reception of subsequent signal 
samples. In general, upon reception of v L+k , the metricsf 

PM k {l L+k ) = max [\np(y L+k \ I L+k , . . . , 4) + PAf*_ 1 (I L+ ^ 1 )3 (10-1-23) 
u 

tWe observe that the metrics PM*(I) are simply related to the euclidean distance metrics 
DMj(I) when the additive noise is gaussian. 


in a 
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FIGURE 10-1-4 


that are computed give the probabilities of the M L surviving sequences. Thus, 
as each signal sample is received, the Viterbi algorithm involves first the 
computation of the M L + l probabilities 

l n P( v £- + * | ll. + ki ■ • • > h) + + (10-1-24) 

corresponding to the M L + { sequences that form the continuations of. the M L 
surviving sequences from the previous stage of the process. Then the M L+X 
sequences are subdivided into M L groups, with each group containing M 
sequences that terminate in the same set of symbols I L+k , . . . , I k + , and differ in 
the symbol /*. From each group of M sequences, we select the one having the 
largest probability as indicated by (10-1-23), while the remaining M - 1 
sequences are discarded. Thus, we are left again with M L sequences having the 
metrics PM k {\ L + k ). 

As indicated previously, the delay in detecting each information symbol is 
variable. In practice, the variable delay is avoided by truncating the surviving 
sequences to the q most recent symbols, where q > L, thus achieving a fixed 
delay. In the case that the M L surviving sequences at time k disagree on the 
symbol I k . q , the symbol in the most probable sequence may be chosen. The 
loss in performance resulting from this suboptimum decision procedure is 
negligible if q 5* 5 L. 


Example 10-1-2 

For illustrative purposes, suppose that a duobinary signal pulse is employed 
to transmit four-level ( M = 4) PAM. Thus, each symbol is a number 
selected from the set {-3, -1, 1,3). The controlled intersymbol interference 
in this partial response signal is represented by the equivalent discrete-time 
channel model shown in Fig. 10-1-4. Suppose we have received u, and v 2 , 
where 


Ul - /) + 17! 

v 2 - h + A + 1 J 2 


(10-1-25) 


Equivalent discrete-time model 
for intersymbo! interference 
resulting from a duobinary pulse. 


Inpul 

h t 






rn 

1 ; 


. J 


—0 — ■ *0 



or n< 

(«i «>) 
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and {17,} is a sequence of statistically independent zero-mean gaussian noise. 
We may now compute the 16 metrics 

PM x (l 2 ,l x )=~ 2 /„/ 2 = ±l, ±3 (10-1-26) 

*=1 ' j =0 ' 

where /* = 0 for k =£ 0. 

Note that any subsequently received signals {u,} do not involve Hence, 
at this stage, we may discard 12 of the 16 possible pairs {/,, I 2 }. This step is 
illustrated by the tree diagram shown in Fig. 10-1-5. In other words, after 
computing the 16 metrics corresponding to the 16 paths in the tree diagram, 



FIGURE 10-1-5 Tree diagram for Viterbi decoding of the doubinary 
puise. 


1 I 1 

PM, (/,. /j) PM , (/,. I,. I,) PM,il t . /,. /,, !,) 
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we discard three out of the four paths that terminate with l 2 = 3 and save the 
most probable of these four. Thus, the metric for the surviving path is 

PM x (l 2 - 3, /,) = max [- t L - 2 /*_Y] 

h L *■=! ' /- 0 ' J 

The process is repeated for each set of four paths terminating with / 2 = 1, 
h = _ 1. and / 2 = -3. Thus four paths and their corresponding metrics survive 
after u, and v 2 are received. 

When u 3 is received, the four paths are extended as shown in Fig. 10-1-5, to 
yield 16 paths and 16 corresponding metrics, given by 

™ 2 (/ 3 , 1 2 , /,) = PMy{h, A) - (|>3 - i / 3 _ Y (10-1-27) 

v /*0 ' 

Of the four paths terminating with the I 3 = 3, we save the most probable. This 
procedure is again repeated for / 3 = 1, / 3 = -l, and A=-3. Consequently, 
only four paths survive at this stage. The procedure is then repeated for each 
subsequently received signal v k for k > 3. 


10-1-4 Performance of MLSE for Channels with ISI 

We shall now determine the probability of error for MLSE of the received 
information sequence when the information is transmitted via PAM and the 
additive noise is gaussian. The similarity between a convolutional code and a 
finite-duration intersymbol interference channel implies that the method for 
computing the error probability for the latter carries over from the former. In 
particular, the method for computing the performance of soft-decision decod- 
ing of a convolutional code by means of the Viterbi algorithm, described in 
Section 8-2-3, applies with some modification. 

In PAM signaling with additive gaussian noise and intersymbol interference, 
the metrics used in the Viterbi algorithm may be expressed as in (10-1-23), or 
equivalently, as 

PM k _ L (\ k ) = />A/*_ L _,(I k _,) - (v k - X fjh-,) (10-1-28) 

V j =0 ' 

where the symbols {/J may take the values ±d, ±3 d , .... ±(M - 1 )d, and 2d 
is the distance between successive levels. The trellis has M L states, defined at 
time k as 

Sk(Ik-i, h-2, ■■■, h-L) (10-1-29) 

Let the estimated symbols from the Viterbi algorithm be denoted by {/„} 
and the corresponding estimated state at time k by 

$k ~ Ok- 1 > ?k- 2, ■ ■ • > I k-L) 


(10-1-30) 
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Now suppose that the estimated path through the trellis diverges from the 
correct path at time k and remerges with the correct path at time k + l. Thus, 
S k = S k and S k ^,~ S k+I , but S m ^ S m for k < m < k + /. As in a convolutional 
code, we call this an error event. Since the channel spans L + 1 symbols, it 
follows that 1*>L + 1. 

For such an error event, we have I k ^ l k and I k ^i- L ^ x ^h+i-L- i, but l m = l m 
for k — m ^ k — 1 and k+l~L^m^k+l~ 1. It is convenient to define 
an error vector e corresponding to this error event as 

e = [c A e* + i ... (10-1-31) 

where the components of e are defined as 

£l= h (Il ~~ !i) ' j = k ’ k + l >- -,k + l-L-l (10-1-32) 


The normalization factor of 2d in (10-1-32) results in elements e that take on 
the values ±1, ±2, ±3, .... ±(M - 1). Moreover, the error vector is charac- 
terized by the properties that E k ^ 0, e k +i-L-i 0, and there is no sequence of 
L consecutive elements that are zero. Associated with the error vector in 
(10-1-31) is the polynomial of degree / — L — 1, 

e(z) - £ k + e* + ,z _1 + e k + 2 z~ 2 + ...+ (10-1-33) 

We wish to determine the probability of occurrence of the error event that 
begins at time k and is characterized by the error vector e given in (10-1-31), 
or, equivalently, by the polymonial given in (10-1-33). To accomplish this, we 
follow the procedure developed by Forney (1972). Specifically, for the error 
event e to occur, the following three subevents E u E 2 , and E } must occur: 

E x : at time k, S k = S k ; 

E 2 : the information symbols I k , /*+„..., h+i-L-\ when added to the 

scaled error sequence 2d(e kt e* + 1 , . . . , e* + ,_ £ ^ 1 ) must result in an 
allowable sequence, i.e., the sequence I k , 7 k+l , ... , 7* + ,_ L _, must have 
values selected from ±d, ±3 d, ±- • • ± (M - 1 )d; 

£ 3 : for k^m <k + 1, the sum of the branch metrics of the estimated path 
exceed the sum of the branch metrics of the correct path. 

The probability of occurrence of £ 3 is 


r*4f-l / L \2 k + l~\ , L s2n 

P{E,) = P 2 U ,-tfiIi-t) < S U - S (10-1-34) 

L ' y-0 > ,=* ' /= 0 > -I 


But 


= 2 fjh-j + n, 


y-o 


(10-1-35) 
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where { 17 ,} is a real-valued white gaussian noise sequence. Substitution of 
(10-1-35) into (10-1-34) yields 

[ k +/ — 1 / L \ 2 k+t~l -1 

2 U + 2 < 2 vJ\ 

i=k ' y=0 ' i-k J 

[ k + l- 1 / L , k+l-i / L v 2-i 

2 vJ2fc-j)<-4d 2 2 (2 Jfc-/) (10-1-36) 

l = * 7=0 > i = k '/-O ' -I 

where e, - 0 for /' < k and j>k +! - L-\. If we define 

L 

<*i=2fi*i-i (10-1-37) 

/= 0 

then (10-1-36) may be expressed as 

P(E 3 ) = p( X a,T?, < ~d X (10-1-38) 

' : = * i=k ' 

where the factor of 4 d common to both terms has been dropped. Now 
(10-1-38) is just the probability that a linear combination of statistically 
independent guassian random variables is less than some negative number. 
Thus 

n£>)=e (>/0,'” f ) (io -‘' 39) 

For convenience, we define 

S 2 (e)= 2 «? = 2 \2fc-,) (10-1-40) 

where e } = 0 for j < k and j > k + / - L - 1. Note that the {aj resulting from 
the convolution of {/} with {*■,-} are the coefficients of the polynomial 

«(z) = F(z)f(z) 

“ a * + "h-h * -1 + ■ ■ • + a* + ,_,z (10-1-41) 

Furthermore, 6 2 (e) is simply equal to the coefficient of z° in the polynomial 

a(z)a(r _l ) = F(z)F(z~ l )e(z)e(z~') 

= X(z)£(z)e(z~ l ) (10-1-42) 

We call S 2 (e) the euclidean weight of the error event e. 
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An alternative method for representing the result of convolving {/} with {c,} 
is the matrix form 

a = ef 

where a is an 1-dimensional vector, f is an (L + l)-dimensional vector, and e is 
an l x (L + 1) matrix, defined as 


a 


a* 



Vo" 

a *-u 


f= 

/. 

“ 

0 

0 

Jl. 

e k 


£ k + 1 

£ k 

0 


£ k+ 2 

1 

£ k 


_£*+/-! 





0 

0 

0 


Then 

<5 2 (e ) = a'a 

= f'e ef 
= f Af 

where A is an (L + 1) X (L + 1) matrix of the form 


0 

0 

0 




(10-1-43) 


(10-1-44) 


A = e'e = 


and 


Po Pi Pi • • • 

P\ P o P 1 • • • 

Pi P\ Po Pi 

Pl 

k+t —l —m 

Pm ~ 2 £ i £ i+m 

i~k 


Pl 
Pl - , 
Pl - 2 

0o 


(10-1-45) 


(10-1-46) 


We may use either (10-1-40) and (10-1-41) or (10-1-45)— (10-1-46) in evaluating 
the error rate performance. We consider these computations later. For now we 
conclude that the probability of the subevent E 3 , given by (10-1-39), may be 
expressed as 


P(E 3 ) 



(10-1-47) 


where we have used the relation 


rf 2 = 


M 2 - 1 


77»„ 


(10-1-48) 


to eliminate d 2 and yav — TP iy ,IN 0 . Note that, in the absence of intersymbol 
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interference, S 2 (e) = 1 and P(E 3 ) is proportional to the symbol error prob- 
ability of M - ary PAM. 

The probability of the subevent E 2 depends only on the statistical properties 
of the input sequence. We assume that the information symbols are equally 
probable and that the symbols in the transmitted sequence are statistically 
independent. Then, for an error of the form |£,| =}, j = 1, 2, . . . , M — 1, there 
are M - j possible values of I, such that 


Hence 


// = // + 2 de, 


P(E 2 ) = 


n 


i‘-0 


M~\i\ 

M 


(10-1-49) 


The probability of the subevent £, is much more difficult to compute exactly 
because of its dependence on the subevent E 3 . That is, we must compute 
P(Ei | £ 3 ). However, P(£, | E 3 ) = 1 - P M , where P M is the symbol error 
probability. Hence P(E { \ E 3 ) is well approximated (and upper-bounded) by 
unity for reasonably low symbol error probabilities. Therefore, the probability 
of the error event e is well approximated and upper-bounded as 

^ (10-1-50) 

Let E be the set of all error events e starting at time k and let w(e) be the 
corresponding number of nonzero components (Hamming weight or number of 
symbol errors) in each error event e. Then the probability of a symbol error is 
upper-bounded (union bound) as 


Pm « X w(e)f , (e) 

te£ 


15 S *'(*00 



M - 1/| 

M 


(10-1-51) 


Now let D be the set of all 5(e). For each 5 e D, let E s be the subset of error 
events for which 5(e) = 5. Then (10-1-51) may be expressed as 


5e £> v v M 1 /L ,-„o M J 

* KsQ ( Vm 2 -1 Yav52 ) 


(10-1-52) 


where 


Ks = S w^e) 

eeEi 


n 


i =0 


M-\i\ 

M 


(10-1-53) 


The expression for the error probability in (10-1-52) is similar to the form of 
the error probability for a convolutional code with soft-decision decoding given 
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by (8-2-26). The weighting factors {£«} may be determined by means of the 
error state diagram, which is akin to the state diagram of a convolutional 
encoder. This approach has been illustrated by Forney (1972) and Viterbi and 
Omura (1979). 

In general, however, the use of the error state diagram for computing Pm is 
tedious. Instead, we may simplify the computation of P M by focusing on the 
dominant term in the summation of (10-1-52). Due to the exponential 
dependence of each term in the sum, the expression P M is dominated by the 
term corresponding to the minimum value of 8, denoted as 5 min . Hence the 
symbol error probability may be approximated as 

Pm ~ K s ^ Q ( y.vfiLn ) (10-1-54) 

where 

2 »v(e) fl (10-1-55) 

In general, dLo^l. Hence, 10 log represents the loss in SNR due to 
intersymbol interference. 

The minimum value of 5 may be determined either from (10-1-40) or from 
evaluation of the quadratic form in (10-1-44) for different error sequences. In 
the following two examples we use (10-1-40). 


Example 10-1-3 

Consider a two-path channel ( L - 1) with arbitrary coefficients f Q and /, 
satisfying the constraint /o +/f = 1. The channel characteristic is 

P(z)=fo+fiZ~ l (10-1-56) 

For an error event of length n, 

e(z) = e 0 + £iZ~ l +...+e n -,z~ in -'\ «^1 (10-1-57) 

The product a(^) = F(z)e(z) may be expressed as 

a(z) = a 0 + a,z _1 + ... + a n z~ n (10-1-58) 

where a 0 = £ 0 /o and or„ =/ 1 r n _ 1 . Since e 0 # 0, e„^ * 0, and 


it follows that 


S 2 (e) = £ a 


k =0 


5 2 m in^/o+/? = l 


(10-1-59) 


Indeed, 8^ in = l when a single error occurs, i.e. e (z) = e 0 . Thus, we 
conclude that there is no loss in SNR in maximum-likelihood sequence 
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estimation of the information symbols when the channel dispersion has 
length 2. 


Example 10-1-4 

The controlled intersymbol interference in a partial response signal may be 
viewed as having been generated by a time-dispersive channel. Thus, the 
intersymbol interference from a duobinary pulse may be represented by the 
(normalized) channel characteristic 

F(z)= Vi + Viz" 1 (10-1-60) 

Similarly, the representation for a modified duobinary pulse is 

F(z) = V\-V^z~ 2 (10-1-61) 

The minimum distance 5„, n = 1 for any error event of the form 

e(z)=±(l-z- l -z~ 2 ... -z- (n ~ l) ), n 3*1 (10-1-62) 

for the channel given by (10-1*60) since 

a(z) = ±v/|=F 

Similarly, when 

e(z) = ±(1 + Z " 2 ~Z _4 + . . . + z _2, "- 1) ), 1 (10-1-63) 

5 min = 1 for the channel given by (10-1-61), since 

a(z)= ±V} T Viz” 2 " 

Hence MLSE of these two partial response signals results in no loss in SNR. 
In contrast, the suboptimum symbol-by-symbol detection described pre- 
viously resulted in a 2.1 dB loss. 

The constant K Smm is easily evaluated for these two signals. With 
precoding, the number of output symbol errors (Hamming weight) as- 
sociated with the error events in (10-1-62) and (10-1-63) is two. Hence, 

„ „ v ( M ~ iy 

= 2 2 ( = 2(W - 1) (10-1-64) 

On the other hand, without precoding, these error events result in n symbol 
errors, and, hence, 

* / M — l\ n 

= 2 2 n ( ) = 2M(M - 1) (10-1-65) 

As a final exercise, we consider the evaluation of ST„ from the quadratic 
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form in (10-1-44). The matrix A of the quadratic form is positive-definite; 
hence, all its eigenvalues are positive. If {/i*(e)) are the eigenvalues and {v t (e)} 
are the corresponding orthonormal eigenvectors of A for an error event e then 
the quadratic form in (10-1-44) can be expressed as 

5 2 (e) - 2 ^(e)[f'v*(e)] 2 (10-1-66) 

k = 1 

In other words, S 2 (e) is expressed as a linear combination of the squared 
projections of the channel vector f onto the eigenvectors of A. Each squared 
projection in the sum is weighted by the corresponding eigenvalue (x k (e), 
k -1,2, . . . , L + \. Then 

5 Ln = min S 2 (e) (10-1-67) 

e 

It is interesting to note that the worst channel characteristic of a given 
length L + 1 can be obtained by finding the eigenvector corresponding to the 
minimum eigenvalue. Thus, if /x min (e) is the minimum eigenvalue for a given 
error event e and v min (e) is the corresponding eigenvector then 

min ~ min A l min( e ) 
e 

f=min v mjn (e) 

e 

and 

o2 _ 

^ mm Mmin 


Example 10-1-5 

Let us determine the worst time-dispersive channel of length 3 (L = 2) by 
finding the minimum eigenvalue of A for different error events. Thus, 

F(z) ^/o + Z,*- 1 +/ 2 z~ 2 

where f 0 , f u and f 2 are the components of the eigenvector of A 
corresponding to the minimum eigenvalue. An error event of the form 


results in a matrix 


e{z) = 1 - z 1 



which has the eigenvalues pi, = 2, pi 2 = 2 + V2, / a 3 = 2 - V2. The eigenvec- 
tor corresponding to /r 3 is 


* 3=[ l 2 VJ 1] 


( 10 - 1 - 68 ) 
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We may also consider the dual error event 

<(z) = 1 + Z~ l 


which results in the matrix 


A = 


2 

1 

0 


1 0 
2 1 
1 2 


This matrix has eigenvalues identical to those of the one for e(z) = 1 - z 1 
The corresponding eigenvector for /x 3 = 2 - V2 is 

V 3 = [ 2 VI -i] (10-1-69) 


Any other error events lead to larger values for Hence, ju. min = 
2 — V2 and the worst-case channel is either 


Vi I] 


or 


I-! V] -|] 


The loss in SNR from the channel is 


-10 log S^in = — 10 log /L mjn = 2.3 dB 

Repetitions of the above computation for channels with L - 3, 4, and 5 
yield the results given in Table 10-1-1. 


10-2 LINEAR EQUALIZATION 

The MLSE for a channel with IS1 has a computational complexity that grows 
exponentially with the length of the channel time dispersion. If the size of the 
symbol alphabet is M and the number of interfering symbols contributing to 
ISI is L, the Viterbi algorithm computes M L+X metrics for each new received 
symbol. In most channels of practical interest, such a large computational 
complexity is prohibitively expensive to implement. 

In this and the following sections, we describe two suboptimum channel 
equalization approaches to compensate for the ISI. One approach employs a 
linear transversal filter, which is described in this section. These filter 


TABLE 10-1-1 MAXIMUM PERFORMANCE LOSS AND CORRESPONDING 
CHANNEL CHARACTERISTICS 


Channel length 
L + l 

Performance loss 
- 10 log 8^,, (dB) 

Minimum-distance channel 

3 

2.3 

0.50, 0.71, 0.50 

4 

4.2 

0.38,0.60,0.60, 0.38 

5 

5.7 

0.29,0.50,0.58, 0.50,0.29 

6 

7.0 

0.23, 0.42, 0.52, 0.52, 0.42, 0.23 



602 DIGITAL COMMUNICATIONS 


Unequalized 



FIGURE 10-2-1 Linear transversal filter. 


structures have a computational complexity that is a linear function of the 
channel dispersion length L. 

The linear filter most often used for equalization is the transversal filter 
shown in Fig. 10-2-1. Its input is the sequence {u*} given in (10-1-16) and its 
output is the estimate of the information sequence {/*}. The estimate of the 
k th symbol may be expressed as 

K 

4=2 c,v k . s (10-2-1) 

where {cy} are the 2K + 1 complex-valued tap weight coefficients of the filter. 
The estimate l k is quantized to the nearest (in distance) information symbol to 
form the decision 7 *. If 7 * is not identical to the transmitted information symbol 
4. an error has been made. 

Considerable research has been performed on the criterion for optimizing 
the filter coefficients {c*}. Since the most meaningful measure of performance 
for a digital communications system is the average probability of error, it is 
desirable to choose the coefficients to minimize this performance index. 
However, the probability of error is a highly nonlinear function of {cy}. 
Consequently, the probability of error as a performance index for optimizing 
the tap weight coefficients of the equalizer is impractical. 

Two criteria have found widespread use in optimizing the equalizer 
coefficients {c ; }. One is the peak distortion criterion and the other is the mean 
square error criterion. 

10-2-1 Peak Distortion Criterion 

The peak distortion is simply defined as the worst-case intersymbol inter- 
ference at the output of the equalizer. The minimization of this performance 
index is called the peak distortion criterion. First we consider the minimization 
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of the peak distortion assuming that the equalizer has an infinite number of 
taps. Then we shall discuss the case in which the transversal equalizer spans a 
finite time duration. 

We observe that the cascade of the discrete-time linear filter model having 
an impulse response {/„} and an equalizer having an impulse response {c„} can 
be represented by a single equivalent filter having the impulse response 


2 Cjfn-j ( 10 - 2 - 2 ) 

i=~ x 

That is, {q n } is simply the convolution of {cj and {/„}. The equalizer is assumed 
to have an infinite number of taps. Its output at the k th sampling instant can 
be expressed in the form 

x 

7 * = + 2 k-n+ 2 CjVk-j ( 10 - 2 - 3 ) 

n /=— x 

The first term in (10-2-3} represents a scaled version of the desired symbol. 
For convenience, we normalize q 0 to unity. The second term is the intersymbol 
interference. The peak value of this interference, which is called the peak 
distortion, is 

@(c) = 2 \q„\ 

nv-0 

= 22 Cjf n J ( 10 - 2 - 4 ) 

« = -* -ae I 

n+O 

Thus, S?(c) is a function of the equalizer tap weights. 

With an equalizer having an infinite number of taps, it is possible to select 
the tap weights so that S5(c) = 0 , i.e., q„ = 0 for all n except n = 0 . That is, the 
intersymbol interference can be completely eliminated. The values of the tap 
weights for accomplishing this goal are determined from the condition 


1 (ii = 0) 
0 («*0) 


2 Cjfn-j | 

By taking the z transform of (10-2-5), we obtain 

Q(z) = C(z)F(z) = I 

or, simply. 


C(z) = 


F(z) 


( 10 - 2 - 5 ) 


( 10 - 2 - 6 ) 

(10-2-7) 


where C(z) denotes the z transform of the {c,}. Note that the equalizer, with 
transfer function C(z), is simply the inverse filter to the linear filter model 
F(z). In other words, complete elimination of the intersymbol interference 
requires the use of an inverse filter to F(z). We call such a filter a zero-forcing 
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I'd ^ 

Channel 


Equalizer 

F1GURF 10-2-2 Block diagram of channel with zero-forcing equalizer. 

fir) 

ryl 

AWGN 

Ind 

C<;) = vT\ 
r(z > 


filter. Figure 10-2-2 illustrates in block diagram the equivalent discrete-time 
channel and equalizer. 

The cascade of the noise-whitening filter having the transfer function 
1 IF*(z~') and the zero-forcing equalizer having the transfer function 1/ F(z) 
results in an equivalent zero-forcing equalizer having the transfer function 


C'(z) 


1 

F(z)F*{z ') 


1 

X{z) 


( 10 - 2 - 8 ) 


as shown in Fig. 10-2-3. This combined filter has as its input the sequence {y*} 
of samples from the matched filter, given by (10-1-10). Its output consists of 
the desired symbols corrupted only by additive zero-mean gaussian noise. The 
impulse response of the combined filter is 


cl=: ki cw " it 

-^iw) dz (10 - 29) 

where the integration is performed on a closed contour that lies within the 
region of convergence of C'{z). Since 2f(z) is a polynomial with 2 L roots 
(P\> Pz> • • ■ , pL>l/p*,llp*, ■ >Vp*), it follows that C'(z) must converge 
in an annular region in the z plane that includes the unit circle (z = e ja ). 
Consequently, the closed contour in the integral can be the unit circle. 

The performance of the infinite-tap equalizer that completely eliminates the 
intersymbol interference can be expressed in terms of the signal-to-noise ratio 
(SNR) at its output. For mathematical convenience, we normalize the received 


FIGURE 10-2-3 Block of channel with equivalent zero-forcing equalizer. 



Gaussian Equivalent equalizer 

noise « i 

< v t> CU) ~ ft •<*-') = *(lj 
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signal energy to unity.f This implies that q 0 = 1 and that the expected value of 
|/*| 2 is also unity. Then the SNR is simply the reciprocal of the noise variance 
<r 2 n at the output of the equalizer. 

The value of c r 2 can be simply determined by observing that the noise 
sequence {v t } at the input to the equivalent zero-forcing equalizer C'(z) has 
zero mean and a power spectral density 


<M») = NoXie*" 1 ), (10-2-10) 

where X (e'" 7 ) is obtained from X(z) by the substitution z = e JU,T . Since 
C'(z) = \/X(z), it follows that the noise sequence at the output of the 
equalizer has a power spectral density 

® nni<o) ~X(£f)’ |a,|€ ? ( 10 - 2 - 11 ) 

Consequently, the variance of the noise variable at the output of the equalizer 
is 


T f K,T 

= I <&„„(<«>) dcj 

J-irtT 

= r d<t) 

and the SNR for the zero-forcing equalizer is 
7 *“ Ifa 2 * 



r d « ] 

l 2 jt J 

J 


( 10 - 2 - 12 ) 


(10-2-13) 


where the subscript on y indicates that the equalizer has an infinite number of 
taps. 

The spectral characteristics X(e' a,T ) corresponding to the Fourier transform 
of the sampled sequence {*„} has an interesting relationship to the analog filter 
H(to) used at the receiver. Since 


**= h*(t)h(t + kT) dt 
use of ParsevaTs theorem yields 

** = 2n J x e'^dto (10-2-14) 

where H{a j) is the Fourier transform of h(t). But the integral in (10-2-14) can 
be expressed in the form 




\e iukT d<o 


(10-2-15) 


t This normalization is used throughout this chapter for mathematical convenience. 
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Now, the Fourier transform of {.**} is 

x 

X{^ T ) = 2 x k e ^ T (10-2-16) 

k — —x 

and the inverse transform yields 

T f* /T 

x k — I *(<?'“' (10-2-17) 
2?r J-Tr/r 

From a comparison of (10-2-15) and (10-2-17), we obtain the desired 
relationship between X{e laiT ) and //(«*>). That is, 

X(e JU>T ) = + | 2 , (10-2-18) 

where the right-hand side of (10-2-18) is called the folded spectrum of |//(a»)| 2 . 
We also observe that \H{<o)\ 2 = X(w), where X(a > ) is the Fourier transform of 
the waveform x(f) and x{t) is the response of the matched filter to the input 
h(t). Therefore the right-hand side of (10-2-18) can also be expressed in terms 
of X{(o). 

Substitution for X{e iu>T ) in (10-2-13) using the result in (10-2-18) yields the 
desired expression for the SNR in the form 


T 2 N 0 r ' 7 du> 

2 n i -^ 2 :=-«|//(« + 2 ^/ 7')| 2 


(10-2-19) 


We observe that if the folded spectral characteristic of 7/(o») possesses any 
zeros, the integrand becomes infinite and the SNR goes to zero. In other 
words, the performance of the equalizer is poor whenever the folded spectral 
characteristic possesses nulls or takes on small values. This behavior occurs 
primarily because the equalizer, in eliminating the intersymbol interference, 
enhances the additive noise. For example, if the channel contains a spectral 
null in its frequency response, the linear zero-forcing equalizer attempts to 
compensate for this by introducing an infinite gain at that frequency. But this 
compensated for the channel distortion at the expense of enhancing the 
additive noise. On the other hand, an ideal channel coupled with an 
appropriate signal design that results in no intersymbol interference will have a 
folded spectrum that satisfies the condition 



( 10 - 2 - 20 ) 


In this case, the SNR achieves its maximum value, namely, 



( 10 - 2 - 21 ) 
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Finite-Length Equalizer Let us now turn our attention to an equalizer 
having 2K + 1 taps. Since q = 0 for |/| > K, the convolution of {/„} with {c„} is 
zero outside the range — K^rt^K+L — 1. That is, q„ = 0 for n < - K and 
n > K + L - 1. With q 0 normalized to unity, the peak distortion is 


K + L - 1 

3(C) = 2 \9n\- 


n — - K 
n*Q 


1 


n = —K 
n*0 


2 


( 10 - 2 - 22 ) 


Although the equalizer has 2K + 1 adjustable parameters, there are 2K + L 
nonzero values in the response {q„}. Therefore, it is generally impossible to 
completely eliminate the intersymbol interference at the output of the 
equalizer. There is always some residual interference when the optimum 
coefficients are used. The problem is to minimize 2)(c) with respect to the 
coefficients {c ; }. 

The peak distortion given by (10-2-22) has been shown by Lucky (1965) to 
be a convex function of the coefficients {c^}. That is, it possesses a global 
minimum and no relative minima. Its minimization can be carried out 
numerically using, for example, the method of steepest descent. Little more 
can be said for the general solution to this minimization problem. However, for 
one special but important case, the solution for the minimization of 2>(c) is 
known. This is the case in which the distortion at the input to the equalizer, 
defined as 

A, = ^t£ l.f,l (10-2-23) 

l/ol n=l 


is less than unity. This co.ndition is equivalent to having the eye open prior to 
equalization. That is, the intersymbol interference is not severe enough to close 
the eye. Under this condition, the peak distortion S(c) is minimized by 
selecting the equalizer coefficients to force q n — 0 for 1 |/z j K and q 0 = 1. In 

other words, the general solution to the minimization of S(c), when D 0 < 1, is 
the zero-forcing solution for {<?„} in the range 1 =£|/t| *£ K. However, the values 
of {<?„} for K + l^n^K + L — l are nonzero, in general. These nonzero 
values constitute the residual intersymbol interference at the output of the 
equalizer. 


10-2-2 Mean Square Error (MSE) Criterion 

In the MSE criterion, the tap weight coefficients {c,} of the equalizer are 
adjusted to minimize the mean square value of the error 

£*=/*- h (10-2-24) 

where /* is the information symbol transmitted in the k th signaling interval 
and l k is the estimate of that symbol at the output of the equalizer, defined in 
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(10-2-1). When the information symbols {/*} are complex-valued, the perfor- 
mance index for the MSE criterion, denoted by J, is defined as 


J = E |f*| 2 

= £IW*I 2 


(10-2-25) 


On the other hand, when the information symbols are real-valued, the 
performance index is simply the square of the real part of e k . In either case, J 
is a quadratic function of the equalizer coefficients {c f }. In the following 
discussion, we consider the minimization of the complex-valued form given in 
(10-2-25). 


Infinite-Length Equalizer First, we shall derive the tap weight coefficients 
that minimize J when the equalizer has an infinite number of taps. In this case, 
the estimate l k is expressed as 


h- 2 c jVk-j (10-2-26) 

Substitution of (10-2-26) into the expression for / given in (10-2-25) and 
expansion of the result yields a quadratic function of the coefficients {c,}. This 
function can be easily minimized with respect to the {c y } to yield a set (infinite 
in number) of linear equations for the {c f }. Alternatively, the set of linear 
equations can be obtained by invoking the orthogonality principle in mean 
square estimation. That is, we select the coefficients { c )} to render the error e k 
orthogonal to the signal sequence {uj-,} for -« < / < oo. Thus, 

E{e k vt-!) = 0, - oc < / < oo (10-2-27) 

Substitution for e k in (10-2-27) yields 


or, equivalently. 




= 0 


oc 

2 c,E(v k -jvt-,) = E(I k vt-,), -oo</<oc (10-2-28) 


To evaluate the moments in (10-2-28), we use the expression for v k given in 
(10-1-16). Thus, we obtain 


and 


-,”**-/) = 2 ttfn+.-i + N 0 8„ 


n — 0 


+ JV 0 5„ (|/-/|*sL) 
0 (otherwise) 


(10-2-29) 




f*-i (-L^l^O) 
0 (otherwise) 


(10-2-30) 
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Now, if we substitute (10-2-29) and (10-2-30) into (10-2-28) and take the z 
transform of both sides of the resulting equation, we obtain 

+ N 0 ] - F*(z ~ ’) (10-2-31) 


Therefore, the transfer function of the equalizer based on the MSE criterion is 

F*(z- 1 ) 


C(z) = 


F(z)F*(z-') + N 0 


(10-2-32) 


When the noise-whitening filter is incorporated into C(z), we obtain an 
equivalent equalizer having the transfer function 


C'(z) = 


1 

F(z)F*(z~') + N 0 


1 

X(z) + N 0 


(10-2-33) 


We observe that the only difference between this expression for C'(z) and 
the one based on the peak distortion criterion is the noise spectral density 
factor N a that appears in (10-2-33). When N 0 is very small in comparison with 
the signal, the coefficients that minimize the peak distortion 2>( c) are 
approximately equal to the coefficients that minimize the MSE performance 
index J. That is, in the limit as tV 0 -> 0, the two criteria yield the same solution 
for the tap weights. Consequently, when N 0 = 0, the minimization of the MSE 
results in complete elimination of the intersymbol interference. On the other 
hand, that is not the case when #0*0. In general, when # o *0, there is both 
residual intersymbol interference and additive noise at the output of the 
equalizer. 

A measure of the residual intersymbol interference and additive noise is 
obtained by evaluating the minimum value of J, denoted by 7 mjn , when the 
transfer function C(z) of the equalizer is given by (10-2-32). Since J = E |e*| 2 = 
E( £ kl*) ~ F(£kl*)> and since E(e k I*) = 0 by virtue of the orthogonality 
conditions given in (10-2-27), it follows that 


■Anin — F( £ kl*) 

~ F |/*.| 2 - 2 

/“ -x 
3C 

= 1 ~ E Cjf-, (10-2-34) 

This particular form for J mm is not very informative. More insight on the 
performance of the equalizer as a function of the channel characteristics is 
obtained when the summation in (10-2-34) is transformed into the frequency 
domain. This can be accomplished by first noting that the summation in 
(10-2-34) is the convolution of {c,} with {/)}, evaluated at a shift of zero. Thus, 
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if {b*} denotes the convolution of these two sequences, the summation in 
(10-2-34) is simply equal to b„. Since the z transform of the sequence { b *} is 


the term b„ is 


*(z) = C(OFU) 

F(z)F*(z ') 
F{z)F*{z~') + N q 

X(z) 

X(z) + N 0 




J z[X(z) + K] 


dz 


(10-2-35) 


(10-2-36) 


The contour integral in (10-2-36) can be transformed into an equivalent line 
integral by the change of variable z = e iu>T . The result of this change of variable 
is 


_T_ r /T X(e’ a ’ 7 ) 

° 2xJ-„ IT X(e i ‘“ 1 ) + N 0 d< ° 


(10-2-37) 


Finally, substitution of the result in (10-2-37) for the summation in (10-2-34) 
yields the desired expression for the minimum MSE in the form 


-£f. 

-£L 


K,T xi** 1 ) 

ni T X{e i<aT ) + N 0 

K 


da> 


xiTX(e>“ T ) + 
it IT 


d(x> 


M 0 


>t n T~ l \H(b> + 2m/ Tf + N 0 


dcj 


(10-2-38) 


In the absence of intersymbol interference, X(e> a,T ) - 1 and, hence, 


•bum = TTTT (10-2-39) 

1 + /V 0 

We observe that 1. Furthermore, the relationship between the output 

(normalized by the signal energy) SNR y ^ and J mi „ must be 


1 fjnin 

y* = ~ (10-2-40) 

''min 

More importantly, this relation between and J min also holds when there is 
residual intersymbol interference in addition to the noise. 
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Finite-Length Equalizer Let us now turn our attention to the case in 
which the transversal equalizer spans a finite time duration. The output of the 
equalizer in the k th signaling interval is 

AC 

2 W-i (10-2-41) 

i—K 


The MSE for the equalizer having 2K + 1 taps, denoted by J(K), is 


J(K) = E\I k -I k \ 2 = E 


K 2 

h~ 2 C i V k~j 
J A t 


(10-2-42) 


Minimization of J(K) with respect to the tap weights {c ; } or, equivalently, 
forcing the error £*=/*- h to be orthogonal to the signal samples vf- b |/| «£ K, 
yields the following set of simultaneous equations: 


S CjV,j = ^ h l ^-K,..., -1,0,1,..., K (10-2-43) 


y=-AC 


where 


and 


'' lo (otherwise) 




(otherwise) 

/-/ (-L«/«0) 

0 (otherwise) 


(10-2-44) 


(10-2-45) 


It is convenient to express the set of linear equations in matrix form. Thus, 

TC = £ (10-2-46) 


where C denotes the column vector of 2K + 1 tap weight coefficients, I 
denotes the (2K + 1) x (2K + 1) Henmitian covariance matrix with elements 
r,y, and £ is a (2 K + l)-dimensional column vector with elements The 
solution of (10-2-46) is 

C opl = r-’C (10-2-47) 

Thus, the solution for C^, involves inverting the matrix T. The optimum tap 
weight coefficients given by (10-2-47) minimize the performance index J(K), 
with the result that the minimum value of J(K ) is 


J m !.(*)« 1 - 2 Cjf.-, 

i—K 

* 1 — €'*r-'€ (10-2-48) 

where g* represents the transpose of the column vector g. J min (K) may be used 
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in (10-2 40) to compute the output SNR for the linear equalizer with 2K + 1 
tap coefficients. 

10-2-3 Performance Characteristics of the MSE Equalizer 

In this section, we consider the performance characteristics of the linear 
equalizer that is optimized by using the MSE criterion. Both the minimum 
MSE and the probability of error are considered as performance measures for 
some specific channels. We begin by evaluating the minimum MSE J min and the 
output SNR y , for two specific channels. Then, we consider the evaluation of 
the probability of error. 

Example 10-2-1 

First, we consider an equivalent discrete-time channel model consisting of 
two components f () and /,, which are normalized to j/ 0 | 2 + |/,| 2 = 1. Then 

Hz)-fo+fiZ~' (10-2-49) 

and 

X{z) =fofU + 1 +fffiz (10-2-50) 

The corresponding frequency response is 

X{e )uT ) - fofU >oxT + 1 + flf,e >- T 

= 1 + 2 |/ 0 | |/,1 cos (o)T + 0) (10-2-51) 

where 6 is the angle of f 0 f f. We note that this channel characteristic 
possesses a null at 0 = n!T when f 0 = f - VI. 

A linear equalizer with an infinite number of taps, adjusted on the basis 
of the MSE criterion, will have the minimum MSE given by (10-2-38). 
Evaluation of the integral in (10-2-38) for the Ar(e'“' r ) given in (10-2-51) 
yields the result 



This result should be compared with the output SNR of 1 /N 0 obtained in 
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the case of no intersymbol interference. A significant loss in SNR occurs 
from this channel. 


Example 10-2-2 

As a second example, we consider an exponentially decaying characteristic 
of the form 

/*=Vi-aV, * = 0,1,... 


where a < 1. The Fourier transform of this sequence is 

1 - a 2 


X{e>” T ) = 


1 + a 2 - 2a cos coT 


which is a function that contains a minimum at a) = n/T. 
The output SNR for this channel is 


‘(V 1+2A ' 0 ' 


1 - a 2 


N 0 <*1 


(10-2-54) 


(l + n 2 )Af 0 ' 

Therefore, the loss in SNR due to the presence of the interference is 


(10-2-55) 


101 o *«(rr?) 


Probability of Error Performance of Linear MSE Equalizer Above, we 
discussed the performance of the linear equalizer in terms of the minimum 
achievable MSE J min and the output SNR y that is related to 7 min through the 
formula in (10-2-40), Unfortunately, there is no simple relationship between 
these quantities and the probability of error. The reason is that the linear MSE 
equalizer contains some residual intersymboi interference at its output. This 
situation is unlike that of the infinitely long zero-forcing equalizer, for which 
there is no residual interference, but only gaussian noise. The residual 
interference at the output of the MSE equalizer is not well characterized as an 
additional gaussian noise term, and, hence, the output SNR does not translate 
easily into an equivalent error probability. 

One approach to computing the error probability is a brute force method 
that yields an exact result. To illustrate this method, let us consider a PAM 
signal in which the information symbols are selected from the set of values 
2n — M — 1, n — 1, 2, . . . , M, with equal probability. Now consider the decision 
on the symbol /„. The estimate of /„ is 


JV 

K ~ <?o 4 . + 2 hq n -k + X c jV» 




f—K 


(10-2-56) 
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where {q„} represent the convolution of the impulse response of the equalizer 
and equivalent channel, i.e., 

A 

q H = 2 c k f, k (10-2-57) 

k = K 

and the input signal to the equalizer is 

A 

v* = 2 J5A ~rM* (10-2-58) 

; -- 0 

The first term in the right-hand side of (10-2-56) is the desired symbol, the 
middle term is the intersymbol interference, and the last term is the gaussian 
noise. The variance of the noise is 


crl^N |, £ ^ (10-2-59) 

/= - A 

For an equalizer with 2 K + 1 taps and a channel response that spans L + 1 
symbols, the number of symbols involved in the intersymbol interference is 
2 K + L. 

Define 

^=2 kq„ k ( 10 - 2 - 60 ) 

k-^n 

For a particular sequence of 2K + L information symbols, say the sequence I y , 
the intersymbol interference term 2 = Dj is fixed. The probability of error for 
a fixed Dj is 

PM) = 2 P(N + Dj> q n ) 

l(q 0 - Dj ) 2 \ 

=-ir- Q [T^r~) < 10 - 2 - 61 > 

where N denotes the additive noise term. The average probability of error is 
obtained by averaging P m (Dj) over all possible sequences I That is, 


Pm = 2 Pm(Dj)P(1j) 


2 (M - 1 ) 

M 




Wo - DjY 


a 7, 


P(b) 


When all the sequences are equally likely, 


P(h) = 


M 


2K + I. 


(10-2-62) 


(10-2-63) 


The conditional error probability terms P M (D } ) are dominated by the 
sequence that yields the largest value of £>,. This occurs when /„ = ±(M - 1) 
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FIGURE 10-2-4 


and the signs of the information symbols match the signs of the corresponding 
{*„}• Then, 

= 1) S \q k \ 

k *0 


and 


Pm{DJ) 



(10-2-64) 


Thus, an upper bound on the average probability of error for equally likely 
symbol sequences is 

P M <P„(D}) (10-2-65) 

If the computation of the exact error probability in (10-2-62) proves to be 
too cumbersome and too time consuming because of the large number of terms 
in the sum and if the upper bound is too loose, one can resort to one of a 
number of different approximate methods that have been devised, which are 
known to yield tight bounds on P M . A discussion of these different approaches 
would take us too far afield. The interested reader is referred to the papers by 
Saltzberg (1968), Lugannani (1969), Ho and Yeh (1970), Shimbo and Celebiler 
(1971), Glave (1972), Yao (1972), and Yao and Tobin (1976). 

As an illustration of the performance limitations of a linear equalizer in the 
presence of severe intersymbol interference, we show in Fig. 10-2-4 the 
probability of error for binary (antipodal) signaling, as measured by Monte 
Carlo simulation, for the three discrete -time channel characteristic shown in 


Error rate performance of linear 
MSE equalizer. 
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FIGURE Id- 2-5 


0.815 


0.72 




H/,l 2 = I 


ti>) 




0.638 



(<) 


Three discrete-time channel characteristics. 


Fig. 10-2-5. For purposes of comparison, the performance obtained for a 
channel with no intersymbol interference is also illustrated in Fig. 10-2-4. The 
equivalent discrete-time channel shown in Fig. 10-2-5(<i) is typical of the 
response of a good quality telephone channel. In contrast, the equivalent 
discrete-time channel characteristics shown in Fig. 10-2-5(6) and (c) result in 
severe intersymbol interference. The spectral characteristics |A'(e /<, ’)| for the 
three channels, illustrated in Fig. 10-2-6, clearly show that the channel in Fig, 
10-2-5(c) has the worst spectral characteristic. Hence the performance of the 
linear equalizer for this channel is the poorest of the three cases. Next in 
performance is the channel shown in Fig. 10-2-5(6), and finally, the best 
performance is obtained with the channel shown in Fig. 10-2-5(a). In fact, the 
error rate of the latter is within 3dB of the error rate achieved with no 
interference. 

One conclusion reached from the results on output SNR y c and the limited 
probability of error results illustrated in Fig. 10-2-4 is that a linear equalizer 
yields good performance on channels such as telephone lines, where the 
spectral characteristics of the channels are well behaved and do not exhibit 
spectral nulls. On the other hand, a linear equalizer is inadequate as a 
compensator for the intersymbol interference on channels with spectral nulls, 
which may be encountered in radio transmission. 
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FIGURE 10-2-6 Amplitude spectra for the channels shown in Figs 10-2-5(a), (b), and (c), respectively. 


The basic limitation of the linear equalizer to cope with severe ISI has 
motivated a considerable amount of research into nonlinear equalizers with 
low computational complexity. The decision-feedback equalizer described in 
Section 10-3 is shown to be an effective solution to this problem. 


10-2-4 Fractionally Spaced Equalizers 

In the linear equalizer structures that we have described in the previous 
section, the equalizer taps are spaced at the reciprocal of the symbol rate, i.e., 
at the reciprocal of the signaling rate l/T. This tap spacing is optimum if the 
equalizer is preceded by a filter matched to the channel distorted transmitted 
pulse. When the channel characteristics are unknown, the receiver filter is 
usually matched to the transmitted signal pulse and the sampling time is 
optimized for this suboptimum filter. In general, this approach leads to an 
equalizer performance that is very sensitive to the choice of sampling time. 

The limitations of the symbol rate equalizer are most easily evident in the 
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frequency domain. From (9-2-5), the spectrum of the signal at the input to the 
equalizer may be expressed as 

Yr(f) = ^2 x(f ~ ( 10 - 2 - 66 ) 

where Y r (f) is the folded or aliased spectrum, where the folding frequency is 
1/2 7. Note that the received signal spectrum is dependent on the choice of the 
sampling delay r 0 . The signal spectrum at' the output of the equalizer is 
C T (f)Y r {f), where 

CV(/)= 2 c k e~' lKfkT (10-2-67) 

k---K 

It is clear from these relationships that the symbol rate equalizer can only 
compensate for the frequency response characteristics of the aliased received 
signal. It cannot compensate for the channel distortion inherent in X {f)e l2 * fXl ' . 

In contrast to the symbol rate equalizer, a fractionally spaced equalizer 
(FSE) is based on sampling the incoming signal at least as fast as the Nyquist 
rate. For example, if the transmitted signal consists of pulses having a raised 
cosine spectrum with a roll-off factor /3, its spectrum extends to F max = 
(1 + (3)/27. This signal can be sampled at the receiver at a rate 

1 + B 

2 7 max = (10-2-68) 

and then passed through an equalizer with tap spacing of 77(1 + fj). For 
example, if /3 = 1, we would have a ^T-space^ equalizer. If /3 = 0.5, we would 
have a ^7-spaced equalizer, and so forth. In general, then, a digitally 
implemented fractionally spaced equalizer has tap spacing of MT/N where M 
and jV are integers and N> M Usually, a ^-spaced equalizer is used in many 
applications. 

Since the frequency response of the FSE is 

C T (f)= 2 c k e~ i2nfkr (10-2-69) 

A- K 

where 7 = MT/N , it follows that C T \f ) can equalize the received signal 
spectrum beyond the Nyquist frequency / = 1/27 to / = (1 + p)/T = NIMT 
The equalized spectrum is 

Cr(f)YAf) = 

n • * ' 

* C T .(f) 2*(f~ (10-2-70) 

Since X(f) — O for |/| > N/MT, (10-2-70) may be expressed as 
CAf)Y r (f) = C T .(f)X(fy 2 «\ 1 /! 


(10-2-71) 
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Thus, we observe that the FSE compensates for the channel distortion in the 
received signal before the aliasing effects due to symbol rate sampling. In other 
words, C T -U) c an compensate for any arbitrary timing phase. 

The FSE output is sampled at the symbol rate 1 IT and has the spectrum 

2 C T [f - £)*(/ - j)e ,2nU k,ru » (10-2-72) 


In effect, the optimum FSE is equivalent to the optimum linear receiver 
consisting of the matched filter followed by a symbol rate equalizer. 

Let us now consider the adjustment of the tap coefficients in the FSE. The 
input to the FSE may be expressed as 



kMT 

N 



(10-2-73) 


In each symbol interval, the FSE produces an output of the form 


/;= 



nMT \ 
N I 


(10-2-74) 


where the coefficients of the equalizer are selected to minimize the MSE. This 
optimization leads to a set of linear equations for the equalizer coefficients that 
have the solution 

C op , =A'« (10-2-75) 

where A is the covariance matrix of the input data and a is the vector of 
cross-correlations, These equations are identical in form to those for the 
symbol rate equalizer, but there are some subtle differences. One is that A is 
Hermitian, but not Toeplitz. In addition, A exhibits periodicities that are 
inherent in a cyclostationary process, as shown by Qureshi (1985). As a result 
of the fractional spacing, some of the eigenvalues of A are nearly zero. 
Attempts have been made by Long et al. (1988a, b) to exploit this property in 
the coefficient adjustment. 

An analysis of the performance of fractionally spaced equalizers, including 
their convergence properties, is given in a paper by Ungerboeck (1976). 
Simulation results demonstrating the effectiveness of the FSE over a symbol 
rate equalizer have also been given in the papers by Qureshi and Forney 
(1977) and Gitlin and Weinstein (1981). We cite two examples from these 
papers. First, Fig. 10-2-7 illustrates the performance of the symbol rate 
equalizer and a iT-FSE for a channel with high-end amplitude distortion, 
whose characteristics are also shown in this figure. The symbol-spaced 
equalizer was preceded with a filter matched to the transmitted pulse that had 
a (square-root) raised cosine spectrum with a 20% roll-off (0 = 0.2). The FSE 
did not have any filter preceding it. The symbol rate was 2400 symbols/s and 
the modulation was QAM. The received SNR was 30 dB. Both equalizers had 
31 taps: hence, the \T ~ FSE spanned one-half of the time interval of the 
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Frequency (Hz) 

(a) Chinnel with high-end amplitude distortion (HA ) 

FIGURE 10-2-7 


Time (symbol interval) 

{b) Equalizer performance 

T and |T equalizer performance as a function of timing phase for 2400 symbols per second. (NRF 
indicates no receiver filter.) [From Qureshi and Forney (1977). © 19 77 [FEE.] 


symbol rate equalizer. Nevertheless, the FSE outperformed the symbol rate 
equalizer when the latter was optimized at the best sampling time. 
Furthermore, the FSE did not exhibit any sensitivity to timing phase, as 
illustrated in Fig. 10-2-7. 

Similar results were obtained by Gitlin and Weinstein. For a channel with 
poor envelope delay characteristics, the SNR performance of the symbol rate 
equalizer and a ^T-FSE are illustrated in Fig. 10-2-8. In this case, both 
equalizers had the same time span. The T-spaced equalizer had 24 taps while 
the FSE had 48 taps. The symbol rate was 2400 symbols/s and the data rate 
was 9600 bits/s with 16-QAM modulation. The signal pulse had a raised cosine 
spectrum with /3 = 0. 12. Note again that the FSE outperformed the T-spaced 
equalizer by several decibels, even when the latter was adjusted for optimum 


FIGURE 10-2-8 


Performance of T and \T equalizers as a function of 
timing phase for 2400syrnbols/s 16-QAM on a channel 
with poor envelope delay. [From Gitlin and Weinstein 
(1981). Reprinted with permission from Bell System 
Technical Journal. © 1981 AT & 7") 
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FIGURE 10-3-1 Structure of decision-feedback equalizer. 


sampling. The results in these two papers clearly demonstrate the superior 
performance achieved with a fractionally spaced equalizer. 


10-3 DECISION-FEEDBACK EQUALIZATION 

The decision-feedback equalizer (DFE), depicted in Fig. 10-3-1, consists of two 
filters, a feedforward filter and a feedback filter. As shown, both have taps 
spaced at the symbol interval T. The input to the feedforward section is the 
received signal sequence {v*}. In this respect, the feedforward filter is identical 
to the linear transversal equalizer described in Section 10-2. The feedback filter 
has as its input the sequence of decisions on previously detected symbols. 
Functionally, the feedback filter is used to remove that part of the intersymbol 
interference from the present estimate caused by previously detected symbols. 


10-3-1 Coefficient Optimization 

From the description given above, it follows that the equalizer output can be 
expressed as 

0 

h= 2 + 2 Cjlt-j (10-3-1) 

/-i 

where 7* is an estimate of the k th information symbol, {ty} are the tap 
coefficients of the filter, and .... 7*- *.} are previously detected symbols. 
The equalizer is assumed to have (A, + 1) taps in its feedforward section and 
K 2 in its feedback section. It should be observed that this equalizer is nonlinear 
because the feedback filter contains previously detected symbols {7*}. 

Both the peak distortion criterior and the MSE criterion result in a 
mathematically tractable optimization of the equalizer coefficients, as can be 
concluded from the papers by George et al. (1971). Price (1972), Salz (1973), 
and Proakis (1975). Since the MSE criterion is more prevalent- in practice, we 
focus our attention on it. Based on the assumption that previously detected 
symbols in the feedback filter are correct, the minimization of MSE 

J(K U K 2 ) = E 


(10-3-2) 
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10 - 3-2 


leads to the following set of linear equations for the coefficients of the 
feedforward filter. 


i / = -a:, -1,0 (10-3-3) 

I- -K, 

where 

= 2 /*/„+,-, + Af<A, = -/C, -1,0 (10-3-4) 

m ~ 0 


The coefficients of the feedback filter of the equalizer are given in terms of the 
coefficients of the feedforward section by the following expression: 

a 

c* = - £ c,f k „ k = \,2,...,K 2 (10-3-5) 

The values of the feedback coefficients result in complete elimination of 
intersymbol interference from previously detected symbols, provided that 
previous decisions are correct and that K z ^ L (see Problem 10-9). 


Performance Characteristics of DFE 

We now turn our attention to the performance achieved with decision- 
feedback equalization. The exact evaluation of the performance is complicated 
to some extent by occasional incorrect decisions made by the detector, which 
then propagate down the feedback section. In the absence of decision errors, 
the minimum MSE is given as 


J min (K t ) = 1 - 2 cj-j 


(10-3-6) 


By going to the limit (K, —> <*) of an infinite number of taps in the feedforward 
filter, we obtain the smallest achievable MSE, denoted as 7 min . With some 
effort 7 min can be expressed in terms of the spectral characteristics of the 
channel and additive noise, as shown by Salz (1973). This more desirable form 
for 7 m ,„ is 


V min = exp { 

The corresponding output SNR is 




( 10 - 3 - 7 ) 




7* 


r T r*n r N + X(e i ' oT )-\ , 1 

=- 1 +exp kL ln h it H "°- 3 - 8> 

We observe again that, in the absence of intersymbol interference, 
X(e i '° 7 ) = 1 and, hence, / min = N 0 !( 1 + N 0 ). The corresponding output SNR is 
7. = UN 0 . 
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Example 10-3-1 

It is interesting to compare the value of J min for the decision -feedback 
equalizer with the value of J min obtained with the linear MSE equalizer. For 
example, ler us consider the discrete-time equivalent channel consisting of 
two taps/, and/i. The minimum MSE for this channel is 




N 0 

+ N„ + 2\M |/,|cos(«r + 6) 


doj j 


r i r 

= A/ 0 exp ^ J In (1 + A , 0 + 21/„ll/ ) |coscu)<fw 

2 N 0 

1 + N n + V(1 + N 0 ) 2 -4 |/o/ 1 | 2 
Note that 7 mjn is maximized when |/,| = |/,| = Then 


(10-3-9) 


1 + N 0 + V(1 + N n ) 2 - 1 

- 2 %, N 0 <n 


The corresponding output SNR is 


■y* 


l 

2 N 0 ’ 


N 0 *l 


(10-3-10) 


(10-3-11) 


Therefore, there is a 3 dB degradation in output SNR due to the presence of 
intersymbol interference. In comparison, the performance loss for the linear 
equalizer is very severe. Its output SNR as given by (10-2-53) is = 
(2/^ 0 ) ,/2 forN 0 <el. 


Example 10-3-2 

Consider the exponentially decaying channel characteristic of the form 

A-(l-« 2 ) 1/2 a\ * = 0,1,2,... (10-3-12) 


where a < 1. The output SNR of the decision-feedback equalizer 

1 + a 2 + (1 — a 2 )/N 0 — 2a cos w ' 

1 + a 2 - 2a cos io 


y* = — 1 + exp 






1 


= - 1 + ^ U - a 2 + N 0 (l + a 2 ) + V[i - o 2 + N 0 (l + a 2 )] 2 - 4 a 2 N 2 0 } 

_ (l-a 2 )[l+N 0 (l+a 2 )/(l-g?)]-N 0 
No 

1 -a 2 
No ’ 


N 0 <1 


(10-3-13) 
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Thus, the loss in SNR is 101og t „ (1 -<j 2 )dB. In comparison, the linear 

equalizer has a loss of 10 log,,, [(1 - a 2 )/( 1 + a l )\ dB. 

These results illustrate the superiority of the decision -feedback equalizer 
over the linear equalizer when the effect of decision errors on performance is 
neglected. It is apparent that a considerable gain in performance can be 
achieved relative to the linear equalizer by the inclusion of the decision- 
feedback section, which eliminates the intersymbol interference from pre- 
viously detected symbols. 

One method of assessing the effect of decision errors on the error rate 
performance of the decision-feedback equalizer is Monte Carlo simulation on a 
digital computer. For purposes of illustration, we offer the following results for 
binary PAM signaling through the equivalent discrete-time channel models 
shown in Figs 10-2-5(b) and ( c ). 

The results of the simulation are displayed in Fig. 10-3-2. First of all, a 
comparison of these results with those presented in Fig. 10-2-4 leads us to 
conclude that the decision-feedback equalizer yields a significant improvement 
in performance relative to the linear equalizer having the same number of taps. 
Second, these results indicate that there is still a significant degradation in 
performance of the decision-feedback equalizer due to the residual intersymboi 
interference, especially on channels with severe distortion such as the one 


FIGURE 10-3-2 Performance of decision-feedback equalizer with and without error propagation. 



SNR. lOlog-yfdB) 
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shown in Fig. 10-2-5(c). Finally, the performance loss due to incorrect 
decisions being fed back is 2 dB, approximately, for the channel responses 
under consideration. Additional results on the probability of error for a 
decision-feedback equalizer with error propagation may be found in the papers 
by Duttweiler et al. (1974) and Beaulieu (1992). 

The structure of the DFE that is analyzed above employs a T-spaced filter 
for the feedforward section. The optimality of such a structure is based on the 
assumption that the analog filter preceding the DFE is matched to the 
channel-corrupted pulse response and its output is sampled at the optimum 
time instant. In practice, the channel response is not known a priori, so it is not 
possible to design an ideal matched filter. In view of this difficulty, it is 
customary in practical applications to use a fractionally spaced feedforward 
filter. Of course, the feedback filter tap spacing remains at T. The use of the 
FSE for the feedforward filter eliminates the system sensitivity to a timing 
error. 

Performance Comparison with MLSE We conclude this subsection on the 
performance of the DFE by comparing its performance against that of MLSE. 
For the two-path channel with /„ = /, = V\, we have shown that MLSE suffers 
no SNR loss while the decision -feedback equalizer suffers a 3dB loss. On 
channels with more distortion, the SNR advantage of MLSE over decision- 
feedback equalization is even greater. Figure 10-3-3 illustrates a comparison of 
the error rate performance of these two equalization techniques, obtained via 
Monte Carlo simulation, for binary PAM and the channel characteristics 
shown in Figs 10-2-5(6) and (c). The error rate curves for the two methods 
have different slopes; hence the difference in SNR increases as the error 


FIGURE 10-3-3 Comparison of performance between MLSE and decision-feedback equalization for channel 
characteristics shown (a) in Fig. 10-2-5(A) and (f>) in Fig. 10-2-5(c). 
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FIGURE 10-3-4 Block diagram of predictive DFE. 


probability decreases. As a benchmark, the error rate for the AWGN channel 
with no intersymbol interference is also shown in Fig. 10-3-3. 


10-3-3 Predictive Decision-Feedback Equalizer 

Belfiore and Park (1979) proposed another DFE structure that is equivalent to 
the one shown in Fig. 10-3-1 under the condition that the feedforward filter has 
an infinite number of taps. This structure consists of a FSE as a feedforward 
filter and a linear predictor as a feedback filter, as shown in the configuration 
given in Fig. 10-3-4. Let us briefly consider the performance characteristics of 
this equalizer. 

First of all, the noise at the output of the infinite length feedforward filler 
has the power spectral density 


N<)X(e ,u ’ T ) k. 

| + X^ 7 )] 2 ’ !< " 1 " T 


(10-3-14) 


The residual intersymbol interference has the power spectral density 


X(^ T ) 2 _ V 2 

N n + X{e‘ wl ) |jV 0 + X (e yw7 )| 2 ' 



(10-3-15) 


The sum of these two spectra represents the power spectral density of the total 
noise and intersymbol interference at the output of the feedforward filter. 
Thus, on adding (10-3-14) and (10-3-15), we obtain 


£(<■>) 


N a 

Ao + X{e>" T ) ' 



( 10 - 3 - 16 ) 


As we have observed previously, if X{e iwl ') = 1, the channel is ideal and. 
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hence, it is not possible to reduce the MSE any further. On the other hand, if 
there is channel distortion, the power in the error sequence at the output of the 
feedforward filter can be reduced by means of linear prediction based on past 
values of the error sequence. 

If i %((o) represents the frequency response of the infinite length feedback 
predictor, i.e., 

•» 

&(w) = 2 b„e ,umT (10-3-17) 

n - I 

then the error at the output of the predictor is 

E(o>) - E{(d)#{o})= E(u)[\ ~ &(w)] (10-3-18) 

The minimization of the mean square value of this error, i.e., 

1 f* /T 

V = — |1 - $(a>)\ 2 \E((o)\ 2 daj (10-3-19) 

LTl J- n iT 

over the predictor coefficients {6,,} yields the optimum predictor in the form 

58(w) = 1 - (10-3-20) 

go 

where G(eo) is the solution to the spectral factorization 


and 


C(a>)G*(-o>) 


1 

|£(a,)| 2 


G(<o) = 2 &> e iumT 


(10-3-21) 


(10-3-22) 


The output of the infinite length linear predictor is a wl\ite noise sequence with 
power spectral density Mgl and the corresponding minimum MSE is given by 
(10-3-7). Therefore, the MSE performance of the infinite-length predictive 
DFE is identical to the conventional DFE. 

Although these two DFE structures result in equivalent performance if their 
lengths are infinite, the predictive DFE is suboptimum if the lengths of the two 
filters are finite. The reason for the optimality of the conventional DFE is 
relatively simple. The optimization of its tap coefficients in the feedforward 
and feedback filters is done jointly. Hence, it yields the minimum MSE. On the 
other hand, the optimizations of the feedforward filter and the feedback 
predictor in the predictive DFE are done separately. Hence, its MSE is at least 
as large as that of the conventional DFE. In spite of^his suboptimality of the 
predictive DFE, it is suitable as an equalizer for trellis-coded signals, where the 
conventional DFE is not as suitable, as described in the next chapter. 
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10-4 BIBLIOGRAPHICAL NOTES AND REFERENCES 

Channel equalization for digital communications was developed by Lucky 
(1965, 1966), who focused on linear equalizers that were optimized using the 
peak distortion criterion. The mean square error criterion for optimization of 
the equalizer coefficients was proposed by Widrow (1966). 

Decision-feedback equalization was proposed and analyzed by Austin 
(1967). Analyses of the performance of the DFE can be found in the papers by 
Monsen (1971), George et al. (1971), Price (1972), Salz (1973), Duttweiler et al. 
(1974), and Altekar and Beaulieu (1993). 

The use of the Viterbi algorithm as the optimal maximum-likelihood 
sequence estimator for symbols corrupted by ISl was proposed and analyzed 
by Forney (1972) and Omura (1971). Its use for carrier-modulated signals was 
considered by Ungerboeck (1974) and MacKenchnie (1973). 


PROBLEMS 

10-1 In a binary PAM system, the input to the detector is 


y m = a,„ + n m + („ 


where a,„ = ± I is the desired signal, n„, is a zero-mean Gaussian random variable 
with variance a 2 n , and i m represents the ISI due to channel distortion, The ISI 
term is a random variable that takes the values -1, 0, and \ with probabilities 
5 , and 1, respectively. Determine the average probability of error as a function 
of o i 

10-2 In a binary PAM system, the clock that specifies the sampling of the correlator 
output is offset from the optimum sampling time by 10%. 
a If the signal pulse used is rectangular, determine the loss in SNR due to the 
mistiming. 

b Determine the amount of ISI introduced by the mistiming and determine its 
effect on performance. 

10-3 The frequency response characteristic of a lowpass channel can be approximated 
by 


//(/) = 1 + “ cos 2lt fi » 


(|«|<I, \f\*W) 
(otherwise) 


where W is the channel bandwidth. An input signal s(/) whose spectrum is 
bandlimited to W Hz is passed through the channel, 
a Show that 

y(t) = s(r ) + ^o[s(/ - /„) + S (t + /„)] 


Thus, the channel produces a pair of echoes, 
b Suppose that the received signal y(t) is passed through a filter matched to s(/). 

Determine the output of the matched filter at t = kT, Jt=0, ±1, ±2 

where T is the symbol duration, 
c What is the ISI pattern resulting from the channel if t„ = T? 

1(M A wireline channel of length 1000 km is used to transmit data by means of binary 
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PAM. Regenerative repeaters are spaced 50 km apart along the system. Each 
segment of the channel has an ideal (constant) frequency response over the 
frequency band 0*£/=sl200Hz and an attenuation of 1 dB/km. The channel 
noise is AWGN. 

a What is the highest bit rate that can be transmitted without IS1? 
b Determine the required $ h /N„ to achieve a bit error of P 2 - 10 7 for each 
repeater. 

c Determine the transmitted power at each repeater to achieve the desired 
*JN a . where N„ = 4.1 X 10 21 W/Hz. 

10-5 Prove the relationship in (10-1-13) for the autocorrelation of the noise at the 
output of the matched filler. 

10-6 In the case of PAM with correlated noise, the correlation metrics in the Viterbi 
algorithm may be expressed in genera! as (Ungerboeck, 1974) 

CAf(I) = 2 2 u. -22 

n n >n 

where x„ =x(nT) is the sampled signal output of the matched filter, {/„} is the 
data sequence, and {/■„} is the received signal sequence at the output of the 
matched filter. Determine the metric for the duobinary signal. 

10-7 Cbnsider the use of a (square-root) raised cosine signal pulse with a roll-off factor 
of unity for transmission of binary PAM over an ideal bandlimited channel that 
passes the pulse without distortion. Thus, the transmitted signal is 


"(')= 2 hgr(l ~ kT h ) 
* 


where the signal interval T„ = \ T. Thus, the symbol rate is double of that for no 
ISI. 

a Determine the ISI values at the output of a matched filter demodulator, 
b Sketch the trellis for the maximum-likelihood sequence detector and label the 
states. 

10-8 A binary antipodal signal is transmitted over a nonideal band-limited channel, 
which introduces ISI over two adjacent symbols. For an isolated transmitted 
signal pulse s(r), the (noise-free) output of the demodulator is \% at i ~ T, 
V^/4 at r = 27". and zero for t = kT, k > 2, where is the signal energy and T is 
the signaling interval. 

a Determine the average probability of error, assuming that the two signals are 
equally probable and the additive noise is white and gaussian. 
b By plotting the error probability obtained in (a) and that for the case of no ISI, 
determine the relative difference in SNR of the error probability of 10 
16-9 Derive the expression in (10-3-5) for the coefficients in the feedback filter of the 
DFE. 

10 - 1 # Binary PAM is used to transmit information over an unequalized linear filter 
channel. When a = 1 is transmitted, the noise-free output of the demodulator is 


'0.3 (m = 1) 

0.9 ( m = 0) 

0.3 ( m = -1) 

„ 0 (otherwise) 
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a Design a three-tap zero-forcing linear equalizer so that the output is 




1 (m — 0 ) 

0 (m = ± 1) 


b Determine q„ for m = ±2, ±3, by convolving the impulse response of the 
equalizer with the channel response. 

10*11 The transmission of a signal pulse with a raised cosine spectrum through a 
channel results in the following (noise-free) sampled output from the 
demodulator: 

-0.5 (* = - 2) 

0.1 (k = -1) 

. 1 (*= 0 ) 

- 0.2 (* = 1 ) 

0.05 (k= 2) 

. 0 (otherwise) 


a Determine the tap coefficients of a three-tap linear equalizer based on the 
zero-forcing criterion- 

b For the coefficients determined in (a), determine the output of the equalizer 
for the case of the isolated pulse. Thus, determine the residual IS1 and its span 
in time. 

10*12 A nonideal band-limited channel introduces IS1 over three successive symbols. 
The (noise-free) response of the matched filter demodulator sampled at the 
sampling time kT is 


£ 


s(t)s(t ~ kT) dt = { 


0.9%„ 

0.1 

10 


(k =0) 

<* = ±1) 

(k = ± 2 ) 
(otherwise) 


a Determine the tap coefficients of a three-tap linear equalizer that equalizes the 
channel (received signal) response to an equivalent partial response (duobi- 
nary) signal 

(*= 0 , 1 ) 

yk lo (otherwise) 

b Suppose that the linear equalizer in (a) is followed by a Viterbi sequence 
detector for the partial signal. Give an estimate of the error probability if the 
additive noise is white and gaussian, with power spectral density {N 0 W/Hz. 

'10*13 Determine the tap weight coefficients of a three-tap zero-forcing equalizer if the 
ISI spans three symbols and is characterized by the values x(0) = 1, Jt( — 1) = 0.3, 
x(l) ~ 0.2. Also determine the residual ISI at the output of the equalizer for the 
optimum tap coefficients. 

10-14 In line-of-sight microwave radio transmission, the signal arrives at the receiver 
via two propagation paths: the direct path and a delayed path that occurs due to 
signal reflection from surrounding terrain. Suppose that the received signal has 
the form 


r(t) = s(t ) + os(f - T) + n(t) 
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where s(t) is the transmitted signal, a is the attenuation (a < 1) of the secondary 
path and n(t ) is AWGN. 

a Determine the output of the demodulator at i = T and / = IT that employs a 
filter matched to s(r). 

b Determine the probability of error for a symbol-by-symbol detector if the 
transmitted signal is binary antipodal and the detector ignores the IS1. 
c What is the error-rate performance of a simple (one-tap) DFE that estimates a 
and removes the ISI? Sketch the detector structure that employs a DFE. 

10-15 Repeat Problem 10-10 using the MMSE as the criterion for optimizing the tap 
coefficients. Assume that the noise power spectral density is 0.1 W/Hz. 

10-16 In a magnetic recording channel, where the readback pulse resulting from a 
positive transition in the write current has the form 


P(t) = 



i 


a linear equalizer is used to equalize the pulse to a partial response. The 
parameter is- defined as the width of the pulse at the 50% amplitude level. 
The bit rate is 1/7}, and the ratio of 7^,/ T h — A is the normalized density of the 
recording. Suppose the pulse is equalized to the partial-response values 




(/* = - 1 . 1 ) 
(« = 0 > 
(otherwise) 


where x(t) represents the equalized pulse shape, 
a Determine the spectrum A '(/) of the band-limited equalized pulse, 
b Determine the possible output levels at the detector, assuming that successive 
transitions can occur at the rate l/T h . 

c Determine the error rate performance of the symbol- by-symbol detector for 
this signal, assuming that the additive noise is zero-mean gaussian with 
variance cr 2 . 

10-17 Sketch the trellis for the Viterbi detector of the equalized signal in Problem 10-16 
and label all the states. Also, determine the minimum euclidean distance between 
merging paths. 

10-18 Consider the problem of equalizing the discrete-time equivalent channel shown 
in Fig. P10-18. The information sequence {/„} is binary (±1) and uncorrelated. 



FIGURE P10-18 
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FIGURE P10-21 



The additive noise {v„} is white and real-valued, with variance N a . The received 
sequence {y„} is processed by a linear three-tap equalizer that is optimized on the 
basis of the MSE criterion. 

a Determine the optimum coefficients of the equalizer as a function of N 0 . 
b Determine the three eigenvalues A,, A 2 , and A 3 of the covariance matrix f and 
the corresponding (normalized to unit length) eigenvectors y,, v 2 , v 3 . 
c Determine the minimum MSE for the three-tap equalizer as a function of N 0 . 
d Determine the output SNR for the three-tap equalizer as a function of N 0 . 
How does this compare with the output SNR for the infinite-tap equalizer? For 
example, evaluate (he output SNR for these two equalizers when \ 0 = 0.l. 

10-19 Use the orthogonality principle to derive the equations for the coefficients in a 
decision-feedback equalizer based on the MSE criterion and given by (10-3-3) 
and (10-3-5). 

10-20 Suppose that the discrete-time model for the intersymbol interference is 
characterized by the tap coefficients ,fi- From the equations for the tap 

coefficients of a decision-feedback equalizer (DFE), show that only L taps are 
needed in the feedback filter of the DFE. That is, if {c*} are the coefficients of the 
feedback filter then c t = 0 for k ^ L + 1. 

10-21 Consider the channel model shown in Fig. P10-21. {v„} is a real-valued 
white-noise sequence with zero mean and variance N 0 . Suppose the channel is to 
be equalized by DFE having a two-tap feedforward filter (c 0 , c^,) and a one-tap 
feedback filter (c,). The {c,} are optimized using the MSE criterion, 
a Determine the optimum coefficients and their approximate values for N 0 < 1. 
b Determine the exact value of the minimum MSE and a first-order approxima- 
tion appropriate to the case N 0 <1. 

c Determine the exact value of the output SNR for the three-tap equalizer as a 
function of N 0 and a first-order approximation appropriate to the case N 0 < 1. 
d Compare the results in (b) and (c) with the performance of the infinite-tap 
DFE. 

e Evaluate and compare the exact values of the output SNR for the three-tap 
and infinite-tap DFE in the special cases where N a - 0.1 and 0.01. Comment on 
how well the three-tap equalizer performs relative to the infinite-tap equalizer. 

10-22 A pulse and its (raised-cosine) spectral characteristic are shown in Fig. PI 0-22. 
This pulse is used for transmitting digital information over a band-limited 
channel at a rate l/T symbols/s. 
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FIGURE P10-22 



a What is the roll-off factor /?? 
b What is the pulse rate? 

c The channel distorts the signal pulses. Suppose the sampled values of the 
filtered received pulse jc(r) are as shown in Fig. P10-22(c) It is obvious that 
there are five interfering signal components. Give the sequence of +ls and -Is 
that will cause the largest (destructive or constructive) interference and the 
corresponding value of the interference (the peak distortion), 
d What is the probability of occurrence of the worst sequence obtained in (c), 
assuming that all binary digits are equally probable and independent? 

10-23 A time-dispersive channel having an impulse response h(t) is used to transmit 
four-phase PSK at a rate R = l/T symbols/s. The equivalent discrete-time 
channel is shown in Fig. PI 0-23. The sequence {i}*} is a white noise sequence 
having zero mean and variance a 2 = N 0 . 

a What is the sampled autocorrelation function sequence {**} defined by 

** = £ h*(t)h(t + kT) dt 



FIGURE P10-23 
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FIGURE P10-24 



for this channel? 

b The minimum MSE performance of a linear equalizer and a decision-feedback 
equalizer having an infinite number of taps depends on the folded spectrum of 
the channel 



where H(u) is the Fourier transform of h(t). Determine the folded spectrum 
of the channel given above. 

c Use your answer in (b) to express the minimum MSE of a linear equalizer in 
terms of the folded spectrum of the channel. (You may leave your answer in 
integral form.) 

d Repeat (c) for an infinite-tap decision-feedback equalizer. 

10-24 Consider a four-level PAM system with possible transmitted levels, 3, 1, -1, and 
-3. The channel through which the data are transmitted introduces intersymbol 
interference over two successive symbols. The equivalent discrete-time channel 
model is shown in Fig. P10-24. {rj*} is a sequence of real-valued independent 
zero-mean gaussian noise variables with variance or 2 = N 0 . The received sequence 
is 

y, = 0.8^ +n, 
y 2 = 0.8/ 2 — 0.6/, + n 2 
y 3 = O.8/3 ~ O.6/2 + n .i 

y* =0.8/* -0.6/*., -t-n* 

a Sketch the tree structure, showing the possible signal sequences for the 
received signals y, , y 2 and y 3 . 

b Suppose the Viterbi algorithm is used to detect the information sequence. How 
many probabilities must be computed at each stage of the algorithm? 
c How many surviving sequences are there in the Viterbi algorithm for this 
channel? 

d Suppose that the received signals are 


y, = 0.5, y 2 = 2.0, y,= - 1.0 
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Determine the surviving sequences through stage y 3 and the corresponding 
metrics. 

e Give a tight upper bound for the probability of error for four-level PAM 
transmitted over this channel. 

10-25 A transversal equalizer with K taps has an impulse response 

A I 

e(<) = S c ^(' ~ kT ) 

k 0 

where T is the delay between adjacent taps, and a transfer function 

E(z) = ^c k z k 

k 0 

The discrete Fourier transform (DFT) of the equalizer coefficients {c k } is defined 

as 

A. t 

£„ = £(z)l.- = 2 c t e n = 0. I K - 1 

k '-(> 

The inverse DFT is defined as 

l*i 

2 fc=0, 1 K - I 

“ rt (I 


a Show that b t = c k , by substituting for £„ in the above expression, 
b From the relations given above, derive an equivalent filter structure having the 
z transform 


F{z) = 




2.(0 Ez (z) 

c If E(z) is considered as two separate filters £,(<:) and E,(z) in cascade, sketch 
a block diagram for each of the filters, using z ' to denote a unit of delay, 
d In the transversal equalizer, the adjustable parameters are the equalizer 
coefficients {c<}. What are the adjustable parameters of the equivalent 
equalizer in (b), and how are they related to {c*}? 



11 


ADAPTIVE 

EQUALIZATION 


In Chapter 10, we introduced both optimum and suboptimum receivers that 
compensate for ISI in the transmission of digital information through band- 
limited, nonideal channels. The optimum receiver employed maximum- 
likelihood sequence estimation for detecting -the information sequence from 
the samples of the demodulation filter. The suboptimum receivers employed 
either a linear equalizer or a decision-feedback equalizer. 

In the development of the three equalization methods, we implicitly 
assumed that the channel characteristics, either the impulse response or the 
frequency response, were known at the receiver. However, in most com- 
munication systems that employ equalizers,- the channel characteristics are 
unknown a priori and, in many cases, the channel response is time-variant. In 
such a case, the equalizers are designed to be adjustable to the channel' 
response and, for time-variant channels, to be adaptive to the time variations 
in the channel response. 

In this chapter, we present algorithms for automatically adjusting the 
equalizer coefficients to optimize a specified performance index and to 
adaptively compensate for time variations in the channel characteristics. We 
also analyze the performance characteristics of the algorithm, including their 
rate of convergence and their computational complexity. 

11-1 ADAPTIVE LINEAR EQUALIZER 

In the case of the linear equalizer, recall that we considered two different 
criteria for determining the values of the equalizer coefficients {c k }. One 
criterion was based on the minimization of the peak distortion at the output of 

636 
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the equalizer, which is defined by (10-2-4). The other criterion was based on 
the minimization of the mean-square error at the output of the equalizer, 
which is defined by (10-2-25). Below, we describe two algorithms for 
performing the optimization automatically and adaptively. 


11-1-1 The Zero-Forcing Algorithm 

In the peak-distortion criterion, the peak distortion 2)(c), given by (10-2-22), is 
minimized by selecting the equalizer coefficients {c k }. In general, there is no 
simple computational algorithm for performing this optimization, except in the 
special case where the peak distortion at the input to the equalizer, defined as 
% in (10-2-23), is less than unity. When ^ 0 <1, the distortion 2>(c) at the 
output of the equalizer is minimized by forcing the equalizer response q n - 0, 
for l*£|n|=sAf, and q 0 =l In this case, there is a simple computational 
algorithm, called the zero-forcing algorithm, that achieves these conditions. 

The zero-forcing solution is achieved by forcing the cross-correlation 
between the error sequence e k = I k - l k and the desired information sequence 
{/*} to be zero for shifts in the range |nj K. The demonstration that this 
leads to the desired solution is quite simple. We have 

E{e k It-,) =E[(I k - l k )/?_,.] 

= £(/*/*%) - £(/*/?_,), j - - K K (11-1-1) 

We assume that the information symbols are uncorrelated, i.e., £(/*/*) = 8 i/t 
and that the information sequence {/*} is uncorrelated with the additive noise 
sequence {?)*}. For I k , we use the expression given in (10-2-41). Then, after 
taking the expected values in (11-1-1), we obtain 

E(e k It-j) = 8 jn -q r j - -K K (11 -1-2) 

Therefore, the conditions 

£(*?*/£_,) = 0, j = -K,...,K (11-1-3) 

are fulfilled when q {) =1 and q„ = 0, 1 ^ |n| =s K. 

When the channel response is unknown, the cross-correlations given by 
(11-1-1) are also unknown. This difficulty can be circumvented by transmitting 
a known training sequence {/*} to the receiver, which can be used to estimate 
the cross-correlation by substituting time averages for the ensemble averages 
given in (11-1-1). After the initial training, which will require the transmission 
of a training sequence of some predetermined length that equals or exceeds the 
equalizer length, the equalizer coefficients that satisfy (11-1-3) can be 
determined. 
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FIGURE li-1- 


A simple recursive algorithm for adjusting the equalizer coefficients is 

cj* + n = cj*’ + Ae*/J_y, j — -K -1,0,1 K (11-1-4) 

where c) k) is the value of the jth coefficient at time t = kT, e k = /* - I k is the 
error signal at time t = kT, and A is a scale factor that controls the rate of 
adjustment, as will be explained later in this section. This is the zero-forcing 
algorithm. The term £*/*_, is an estimate of the cross-correlation (ensemble 
average) £(e*/J _, )■ The averaging operation of the cross-correlation is 
accomplished by means of the recursive first-order difference equation 
algorithm in (11-1-4), which represents a simple discrete-time integrator. 

Following the training period, after which the equalizer coefficients have 
converged to their optimum values, the decisions at the output of the detector 
are generally sufficiently reliable so that they may be used to continue the 
coefficient adaptation process. This is called a decision-directed mode of 
adaptation. In such a case, the cross-correlations in (11-1-4) involve the error 
signal £ k = I k - I k and the detected output sequence j = —K, . . . , K. Thus, 
in the adaptive mode, (11-1-4) becomes 

c i k+n = + A e k l* k -i (11-1-5) 

Figure 11-1-1 illustrates the zero-forcing equalizer in the training mode and the 
adaptive mode of operation. 

The characteristics of the zero-forcing algorithm are similar to those of the 
LMS algorithm, which minimizes the MSE and which is described in detail in 
the following section. 


An adaptive zero-forcing equalizer. 
Input 
















CHAPTER ’ll: ADAPTIVE EQUALIZATION 639 


11-1-2 The LMS Algorithm 

In the minimization of the MSE, treated in Section 10-2-2, we found that the 
optimum equalizer coefficients are determined from the solution of the set of 
linear equations, expressed in matrix form as 

rc = £ (11-1-6) 

where T is the (2 K + 1) X {IK + 1) covariance matrix of the signal samples 
{u A }, C is the column vector of (2 K+ 1) equalizer coefficients, and £ is a 
(2 K + l)-dimensional column vector of channel filter coefficients. The solution 
for the optimum equalizer coefficients vector C opl can be determined by 
inverting the covariance matrix I\ which can be efficiently performed by use of 
the Levinson-Durbin algorithm described in Appendix A. 

Alternatively, an iterative procedure that avoids the direct matrix inversion 
may be used to compute C (ipl . Probably the simplest iterative procedure is the 
method of steepest descent, in which one begins by arbitrarily choosing the 
vector C, say as C () . This initial choice of coefficients corresponds to some point 
on the quadratic MSE surface in the (2 K + l)-dimensional space of 
coefficients. The gradient vector G<>, having the 2K + 1 gradient components 

2 ft/ / <5cu * , k = -K - 1 , 0, 1 , .... K, is then computed at this point on the 

MSE surface, and each tap weight is changed in the direction opposite to its 
corresponding gradient component. The change in the jlh tap weight is 
proportional to the size of the y'th gradient component. Thus, succeeding values 
of the coefficient vector C are obtained according to the relation 

C* + 1 = C* - AG*, k = 0,1,2,... (11-1-7) 

where the gradient vector G* is 

I cU 

Gk= 2 dC k = rC * ” * = ~ E{ek V? > ( 1 1 -1 ' -8) 

The vector C k represents the set of coefficients at the *th iteration, e k = l k -f k 
is the error signal at the fcth iteration, V* is the vector of received signal 
samples that make up the estimate /*, i.e., V* =[u* + * ... v k ... v k and 
A is a positive number chosen small enough to ensure convergence of the 
iterative procedure. If the minimum MSE is reached for some k = k it then 
G* = 0, so that no further change occurs in the tap weights. In general, J min (K) 
cannot be attained for a finite value of k () with the steepest-descent method, it 
can, however, be approached as closely as desired for some finite value of k„. 

The basic difficulty with the method of steepest descent for determining the 
optimum tap weights is the lack of knowledge of the gradient vector G*. which 
depends on both the covariance matrix T and the vector £ of cross-correlations. 
In turn, these quantities depend on the coefficients {/*} of the equivalent 
discrete-time channel model and on the covariance of the information 
sequence and the additive noise, all of which may be unknown at the receiver 
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in general. To overcome the difficulty, estimates of the gradient vector may be 
used. That is, the algorithm for adjusting the tap weight coefficients may be 
expressed in the form 

e k+i =C k -AG k (11-1-9) 

where G* denotes an estimate of the gradient vector G k and C* denotes the 
estimate of the vector of coefficients. 

From (11-1-8) we note that G k is the negative of the expected value of the 
e k V*. Consequently, an estimate of G* is 

G k = -e k \* k (11-1-10) 

Since E(G k ) = G k , the estimate G* is an unbiased estimate of the true gradient 
vector G*. Incorporation of (11-1-10) into (11-1-9) yields the algorithm 

c*_, = c* + A£*v;? (li-i-ii) 

This is the basic LMS (least-mean-square) algorithm for recursively adjusting 
the tap weight coefficients of the equalizer first proposed by Widrow and Hoff 
(1960). It is illustrated in the equalizer shown in Fig. 11-1-2. 

The basic algorithm given by (11-1-11) and some of its possible variations 
have been incorporated into manv commercial adaptive equalizers that are 


FIGURE 11-1-2 Linear adaptive equalizer based on MSE criterion. 









CHAPTER II: ADAPTIVE EQUALIZATION 641 


used in high-speed modems. Three variations of the basic algorithm are 
obtained by using only sign information contained in the error signal E k and/or 
in the components of \ k . Hence, the three possible variations are 


f ( * + 1 ), = c kj + A csgn {e. k )vt- p j = -K -1,0,1 K (11-1-12) 

c (k+])j = c kj +Af* csgn (!/?_,-), j = —K -1,0,1 , K (11-1-13) 

Ca + I)/ = Q; + A csgn (e*) csgn (uj _.,■), j= -K -1, 0, 1, . . . , K (11-1-14) 

where csgn (jr) is defined as 


f 1 + J 


csgn (jc) = 


1-7 
' -1+7 
.- 1-7 


(Re (x)> 0, lm (jt)>0) 
(Re (*)> 0, Im (jc) < 0) 
(Re (jc) <0, Im (x) > 0) 
(Re (*) < 0, Im (*) < 0) 


(11-1-15) 


(Note that in (11-1-15), j = V — 1, as distinct from the index j in (1 1-1 -12)— (1 1- 
1-14).) Clearly, the algorithm in (11-1-14) is the most easily implemented, but 
it gives the slowest rate of convergence to the others. 

Several other variations of the LMS algorithm are obtained by averaging or 
filtering the gradient vectors over several iterations prior to making adjust- 
ments of the equalizer coefficients. For example, the average over TV gradient 
vectors is 

I TV- I 

^ 1 « ^ 

G m /V ~ — ~ 2j S mN*n^mN + n (11-1-16) 

TV H=0 


and the corresponding recursive equation for updating the equalizer 
coefficients once very TV iterations is 

+ q,v = — AG* N (11-1-17) 

In effect, the averaging operation performed in (11-1-16) reduces the noise in 
the estimate of the gradient vector, as shown by Gardner (1984). 

An alternative approach is to filter the noisy gradient vectors by a lowpass 
filter and use the output of the filter as an estimate of the gradient vector. For 
example, a simple lowpass filter for the noisy gradients yields as an output 

G* =wG*_, + (1 - w)G*. G(0) = G(0) (11-1-18) 

where the choice of 0^ w < 1 determines the bandwidth of the lowpass filter. 
When w is close to unity, the filter bandwidth is small and the effective 
averaging is performed over many gradient vectors. On the other hand, when 
w is small, the lowpass filter has a large bandwidth and, hence, it provides little 
averaging of the gradient vectors. With the filtered gradient vectors given by 
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(11-1-18) in place of G A , we obtain the filtered gradient LMS algorithm given 
by 

C* 4 , = C* — AG* (11-1-19) 

In the above discussion, it has been assumed that the receiver has 
knowledge of the transmitted information sequence in forming the error signal 
between the desired symbol and its estimate. Such knowledge can be made 
available during a short training period in which a signal with a known 
information sequence is transmitted to the receiver for initially adjusting the 
tap weights. The length of this sequence must be at least as long as the length 
of the equalizer so that the spectrum of the transmitted signal adequately 
covers the bandwidth of the channel being equalized. 

In practice, the training sequence is often selected to be a periodic 
pseudo random sequence, such as a maximum length shift-register sequence 
whose period N is equal to the length of the equalizer (N = 2K + 1). In this 
case, the gradient is usually averaged over the length of the sequence as 
indicated in (11-1-16) and the equalizer is adjusted once a period according to 
(11-1-17). A practical scheme for continuous adjustment of the tap weights 
may be either a decision-directed mode of operation in which decisions on the 
information symbols are assumed to be correct and used in place of l k in 
forming the error signal e kl or one in which a known pseudo-random-probe 
sequence is inserted in the information-bearing signal either additively or by 
interleaving in time and the tap weights adjusted by comparing the received 
probe symbols with the known transmitted probe symbols. In the decision- 
directed mode of operation, the error signal becomes i k ~l k - 1 k , where I k is 
the decision of the receiver based on the estimate f k . As long as the receiver is 
operabng at low error rates, an occasional error will have a negligible effect on 
the convergence of the algorithm. 

If the channel response changes, this change is reflected in the coefficients 
{/(} of the equivalent discrete-time channel model. It is also reflected in the 
error signal e k , since it depends on {/*}. Hence, the tap weights will be changed 
according to (11- 1 -11) to reflect the change in the channel. A similar change in 
the tap weights occurs if the statistics of the noise or the information sequence 
change. Thus, the equalizer is adaptive. 


11-1-3 Convergence Properties of the LMS Algorithm 

The convergence properties of the LMS algorithm given by (11-1-11) are 
governed by the step-size parameter A. We shall now consider the choice of 
the parameter A to ensure convergence of the steepest-descent algorithm in 
(11-1-7), which employs tha^xact value of the gradient. 

From (11-1-7) and (11-1-8), we have 

c 4t1 »c 4 -ag* 

= (l-Ar)C* + A£ 


( 11 - 1 - 20 ) 
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FIGURE 11-1-3 


Closed-loop control system representation of recursive 
equation in (11-1-20). 



where I is the identity matrix, T is the autocorrelation matrix of the received 
signal, C* is the (2 K + L)-dimensional vector of equalizer tap gains, and £ is the 
vector of cross-correlations given by (10-2-45). The recursive relation in 
(11-1-20) can be represented as a closed-loop control system as shown in Fig. 
11-1-3. Unfortunately, the set of 2/£ + 1 first-order difference equations in 
(11-1-20) are coupled through the autocorrelation matrix T. In order to solve 
these equations and, thus, establish the convergence properties of the recursive 
algorithm, it is mathematically convenient to decouple the equations by 
performing a linear transformation. The appropriate transformation is 
obtained by noting that the matrix T is Hermitian and, hence, can be 
represented as 

r = UAU'* . (11-1-21) 

where U is the normalized modal matrix of T and A is a diagonal matrix with 
diagonal elements equal to the eigenvalues of T. 

When (11-1-21) is substituted into (11-1-20) and if we define the trans- 
formed (orthogonalized) vectors C° k = U'*C* and = U'*g. we obtain 

CZ +1 «(I-AA)Q + Ar (11-1-22) 

This set of first order difference equations is now decoupled. Their conver- 
gence is determined from the homogeneous equation 

C?+, = (I - AA)C* (11-1-23) 

We see that the recursive relation will converge provided that all the poles lie 
inside the unit circle, i.e., 

|1-AA*J<1, k = -K , .... -1,0, (11-1-24) 

where {A*} is the set of 2 K + 1 (possibly nondistinct) eigenvalues of T. Since T 
is an autocorrelation matrix, it is positive-definite and, hence. A* > 0 for all k. 
Consequently convergence of the recursive relation in (11-1-22) is ensured if \ 
satisfies the inequality 

„ * 2 

0<A<- — (11-1-25) 

^ max 

where A mlx is the largest eigenvalue of F. 

Since the largest eigenvalue of a positive-definite matrix is less than the sum 
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of all the eigenvalues of the matrix and, furthermore, since the sum of the 
eigenvalues of a matrix is equal to its trace, we have the following simple upper 
bound on A max : 

K 

< - 2 A* = tr F = (2 K + l)r** 

= (2K + \)( X() + N 0 ) (11-1-26) 

From (11-1-23) and (11-1-24) we observe that rapid convergence occurs 
when |1 - AA*| is small, i.e., when the pole positions are far from the unit 
circle. But we cannot achieve this desirable condition and still satisfy (11-1-25) 
if there is a large difference between the largest and smallest eigenvalues of T. 
In other words, even if we select A to be near the upper bound given in 
(11-1-25), the convergence rate of the recursive MSE algorithm is determined 
by the smallest eigenvalue A mjf> . Consequently, the ratio A max /A min ultimately 
determines the convergence rate. If A max /A min is small, A can be selected so as 
to achieve rapid convergence. However, if the ratio A max /A min is large, as is the 
case when the channel frequency response has deep spectral nulls, the 
convergence rate of the algorithm will be slow. 


11-1-4 Excess MSE Due to Noisy Gradient Estimates 

The recursive algorithm in (11-1-11) for adjusting the coefficients of the linear 
equalizer employs unbiased noisy estimates of the gradient vector. The noise in 
these estimates causes random fluctuations in the coefficients about their 
optimal values and, thus, leads to an increase in the MSE at the output of the 
equalizer. That is, the final MSE is J m , n + J 4 , where / 4 is the variance of the 
measurement noise. The term J A due to the estimation noise has been termed 
excess means-square error by Widrow (1966). 

The total MSE at the output of the equalizer for any set of coefficients C 
can be expressed as 


J -Anin + (C C opl ) *T(C C opt ) (11-1-27) 

where C opt represents the optimum coefficients, which satisfy (11-1-6). This 
expression for the MSE can be simplified by performing the linear orthogonal 
transformation used above to establish convergence. The result of this 
transformation applied to (11-1-27) is 

K 

in+ 2 A*£|C?-C? opt I 2 (11-1-28) 

k = —K 

where the {c£} are the set of transformed equalizer coefficients. The excess 
MSE is the expected value of the second term in (11-1-28), i.e., 

K 

2 A*£ |c* - c* opl | 2 


(11*1-29) 
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It has been shown by Widrow (1970, 1975) that the excess MSE is 

A = (11 ' 1 ' 30) 

The expression in (11-1-30) can be simplified when A is selected such that 
AA* « 1 for all k. Then 

K 

J \ 2A*/ m i n S ^ k 

k= -K 


~ 5A/ mjn tr r 

~jA(2K + iy mill (x 0 + Afo) (11-1-31) 


Note that jc 0 + N 0 represents the received signal plus noise power. 

It is desirable to have < 7 min . That is, A should be selected such that 


7*— iA(2/C + l)(x„ + Af 0 )<l 

‘'min 

or, equivalently, 

A< 2 

(2K + l)(jt„ + N„) ■ 

For example, if A is selected as 


(11-1-32) 


A = 


0.2 

(2 K + l)(x 0 + N 0 ) 


(11-1-33) 


the degradation in the output SNR of the equalizer due to the excess MSE is 
less than 1 dB. 

The analysis given above on the excess mean square error is based on the 
assumption that the mean value of the equalizer coefficients has converged to 
the optimum value Cop,. Under this condition, the step size A should satisfy the 
bound in (11-1-32). On the other hand, we have determined that convergence 
of the mean coefficient vector requires that A<2/A max . While a choice of A 
near the upper bound 2/A„. ax may lead to initial convergence of the 
deterministic (known) steepest-descent gradient algorithm, such a large value 
of A will usually result in instability of the LMS stochastic gradient algorithm. 

The initial convergence or transient behavior of the LMS algorithm has 
been investigated by several researchers. Their results clearly indicate that the 
step size must be reduced in direct proportion to the length of the equalizer as 
specified by (11-1-32). Hence, the upper bound given by (11-1-32) is also 
necessary to ensure the initial convergence of the LMS algorithm. The papers 
by Gitlin and Weinstein (1979) and Ungerboeck (1972) contain analyses of the 
transient behavior and the convergence properties of the LMS algorithm. 
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FIGURE 11-1-4 


(4 

t/> 

7 

3 

Q. 

Initial convergence characteristics of the LMS 
algorithm with different step sizes. [ From Digital 
Signal Processing, by J. G. Proakis and D. G. Manoiakis, 
1988. Macmillan Publishing Company. Reprinted with 
permission of the publisher .} 



Number of iterations 


The following example serves to reinforce the important points made above 
regarding the initial convergence of the LMS algorithm. 


Example 11-1-1 

The LMS algorithm was used to adaptively equalize a communication 
channel for which the autocorrelation matrix T has an eigenvalue spread of 
AmaxMmin = H. The number of taps selected for the equalizer was 2K + 1 = 
11. The input signal plus noise power x 0 + N t) was normalized to unity. 
Hence, the upper bound on A given by (11-1-32) is 0.18. Figure 11-1-4 
illustrates the initial convergence characteristics of the LMS algorithm for 
A = 0.045, 0.09, and 0.1 15, by averaging the (estimated) MSE in 200 
simulations. We observe that by selecting A = 0.09 (one-half of the upper 
bound) we obtain relatively fast initial convergence. If we divide A by a 
factor of 2 to A = 0.045, the convergence rate is reduced but the excess 
mean square error is also reduced, so that the LMS algorithm performs 
better in steady state (in a time-invariant signal environment). Finally, we 
note that a choice of A = 0.115, which is still far below the upper bound, 
causes large undesirable fluctuations in the output MSE of the algorithm. 


In a digital implementation of the LMS algorithm, the choice of the 
step-size parameter becomes even more critical. In an attempt to reduce the 
excess mean square error, it is possible to reduce the step-size parameter to the 
point where the total mean square error actually increases. This condition 
occurs when the estimated gradient components of the vector c*V£ after 
multiplication by the small step-size parameter A are smaller than one-half of 
the least significant bit in the fixed-point representation of the equalizer 
coefficients. In such a case, adaptation ceases. Consequently, it is important for 
the step size to be large enough to bring the equalizer coefficients in the 
vicinity of C opl . If it is desired to decrease the step size significantly, it is 
necessary to increase the precision in the equalizer coefficients. Typically, 16 
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bits of precision may be used for the coefficients, with about 10-12 of the most 
significant bits used for arithmetic operations in the equalization of the data. 
The remaining least significant bits are required to provide the necessary 
precision for the adaptation process. Thus, the scaled, estimated gradient 
components AcV* usually affect only the least-significant bits in any one 
iteration. In effect, the added precision also allows for the noise to be averaged 
out, since many incremental changes in the least-significant bits are required 
before any change occurs in the upper more significant bits used in arithmetic 
operations for equalizing the data. For an analysis of roundoff errors in a 
digital implementation of the LMS algorithm, the reader is referred to the 
papers by Gitlin and Weinstein (1979), Gitlin et al. (1982), and Caraiscos and 
Liu (1984). 

As a final point, we should indicate that the LMS algorithm is appropriate 
for tracking slowly time-invariant signal statistics. In such a case, the minimum 
MSE and the optimum coefficient vector will be time-variant. In other words, 
•A™n(rt) is a function of time and the (2 K + l)-dimensional error surface is 
moving with the time index n. The LMS algorithm attempts to follow the 
moving minimum ) in the {IK + l)-dimensional space, but it is always 
lagging behind due to its use of (estimated) gradient vectors. As a conse- 
quence, the LMS algorithm incurs another form of error, called the lag error, 
whose mean square value decreases with an increase in the step size A. The 
total MSE error can now be expressed as 

•Aotal •^min(u) "1" 3 \ 4- J t 

where J t denotes the mean square error due to the lag. 

In any given nonstationary adaptive equalization problem, if we plot the 
errors / A and J, as a function of A, we expect these errors to behave as 
illustrated in Fig. 11-1-5. We observe that increases with an increase in A 
while J , decreases with an increase in A. The total error will exhibit a 
minimum, which will determine the optimum choice of the step-size parameter. 

When the statistical time variations of the signal occur rapidly, the lag error 


FIGURE 11-1-5 


Mean square error 


Excess mean square error and lag 
error J, as a function of the step size. 
[From Digital Signal Processing, by J. C. 
Proakis and D G. Manolakis, 1988. 
Macmillan Publishing Company. 
Reprinted with permission of the 
publisher] 
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FIGURE 11-1-6 QAM signal demodulation. 


will dominate the performance of the adaptive equalizer. In such a case, 
+ even when the largest possible value of A is used. When this 
condition occurs, the LMS algorithm is inappropriate for the application and 
one must rely on the more complex recursive least-squares algorithms 
described in Section 11-4 to obtain faster convergence and tracking. 


11-1-5 Baseband and Passband Linear Equalizers 

Our treatment of adaptive linear equalizers has been in terms of equivalent 
lowpass signals. However, in a practical implementation, the linear adaptive 
equalizer shown in Fig. 11-1-2 can be realized either at baseband or at 
bandpass. For example Fig. 11-1-6 illustrates the demodulation of QAM (or 
multiphase PSK) by first translating the signal to baseband and equalizing the 
baseband signal with an equalizer having complex-valued coefficients. In effect, 
the complex equalizer with complex-valued (in-phase and quadrature com- 
ponents) input is equivalent to four parallel equalizers with real-valued tap 
coefficients as shown in Fig. 11-1-7. 

As an alternative, we may equalize the signal at passband. This is 


FIGURE 11-1-7 Complex-valued baseband equalizer for 
QAM signals. 
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FIGURE 11-1-8 QAM or PSK signal equalization ai passband. 


accomplished as shown in Fig. 11-1-8 for a two-dimensional signal constellation 
such as QAM and PSK. The received signal is filtered and, in parallel, it is 
passed through a Hilbert transformer, called a phase-splitting filter. Thus, we 
have the equivalent of in-phase and quadrature components at passband, 
which are fed to a passband complex equalizer. Following the equalization, the 
signal is down-converted to a baseband and detected. The error signal 
generated for the purpose of adjusting the equalizer coefficients is formed at 
baseband and frequency-translated to passband as illustrated in Fig. 11-1-8. 


11-2 ADAPTIVE DECISION-FEEDBACK EQUALIZER 

As in the case of the linear adaptive equalizer, the coefficients of the 
feedforward filter and the feedback filter in a decision-feedback equalizer may 
be adjusted recursively, instead of inverting a matrix as implied by (10-3-3). 
Based on the minimization of the MSE at the output of the DFE, the 
steepest-descent algorithm takes the form 

C* +] =C* (11-2-1) 

where C* is the vector of equalizer coefficients in the k\h signal interval, 
£(s*V?) is the cross-correlation of the error signal e k = I k - l k with V* and 
V* = [v* +/f| . . . v k /*_, ... l k -K 2 ]', representing the signal values in the 
feedforward and feedback filters at time t ~ kT. The MSE is minimized when 
the cross-correlation vector E(e k \t) ~ 0 as k — » oo. 

Since the exact cross-correlation vector is unknown at any time instant, we 
use as an estimate the vector e k Y* and average out the noise in the estimate 
through the recursive equation 

£* +1 = C* + Ae k \t (11-2-2) 

This is the LMS algorithm for the DFE. 
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FIGURE 11-2-I Decision-feedback equalizer. 


As in the case of a linear equalizer, we may use a training sequence to 
adjust the coefficients of the DFE initially. Upon convergence to the (near-) 
optimum coefficients (minimum MSE), we may switch to a decision-directed 
mode where the decisions at the output of the detector are used in forming the 
error signal e k and fed to the feedback filter. This is the adaptive mode of the 
DFE, which is illustrated in Fig 11-2-1. In this case, the recursive equation for 
adjusting the equalizer coefficient is 

C* + , = C* + A E k \t (11-2-3) 

where £*=/*- l k and V* = (u* + K| ... v k ... 

The performance characteristics of the LMS algorithm for the DFE are 
basically the same as the development given in Sections 11-1-3 and 11-1-4 for 
the linear adaptive equalizer. 


11-2-1 Adaptive Equalization of Trellis-Coded Signals 

Bandwidth efficient trellis-coded modulation that was described in Section 8-3 
is frequently used in digital communications over telephone channels to reduce 
the required SNR per bit for achieving a specified error rate. Channel 
distortion of the trellis-coded signal forces us to use adaptive equalization in 
order to reduce the intersymbol interference. The output of the equalizer is 
then fed to the Viterbi decoder, which performs soft-decision decoding of the 
trellis-coded signal. 
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FIGURE 11-2-2 


FIGURE 11-2-3 


Error signal 



The question that arises regarding such a receiver is how do we adapt the 
equalizer in a data transmission mode? One possibility is to have the equalizer 
make its own decisions at its output solely for the purpose of generating an 
error signal for adjusting its tap coefficients, as shown in the block diagram in 
Fig. 11-2-2. The problem with this approach is that such decisions are generally 
unreliable, since the pre-decoding coded symbol SNR is relatively low. A high 
error rate would cause a significant degradation in the operation of the 
equalizer, which would ultimately affect the reliability of the decisions at the 
output of the decoder. The more desirable alternative is to use the post- 
decoding decisions from the Viterbi decoder, which are much more reliable, to 
continuously adapt the equalizer. This approach is certainly preferable and 
viable when a linear equalizer is used prior to the Viterbi decoder. The 
decoding delay inherent in the Viterbi decoder can be overcome by introduc- 
ing an identical delay in the tap weight adjustment of the equalizer coefficients 
as shown in Fig. 11-2-3. The major price that must be paid for the added delay 
is that the step-size parameter in the LMS algorithm must be reduced, as 
described by Long et al. (1987, 1989), in order to achieve stability in the 
algorithm. 

In channels with one or more in-band spectral nulls, the linear equalizer is 


Adjustment of equalizer based on decisions from the Viterbi decoder. 


Error signal 



Decisions 
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Data 



<u) Transmitter 


Received 



(ft) Receiver 

FIGURE 11-2*4 Use of predictive DFF. with interleaving and trellis-coded modulation. 


no longer adequate for compensating the channel intersymbol interference. 
Instead, we should like to use a DFE. But the DFE requires reliable decisions 
in its feedback filter in order to cancel out the intersymbol interference from 
previously detected symbols. Tentative decisions prior to decoding would be 
highly unreliable and, hence, inappropriate. Unfortunately, the conventional 
DFE cannot be cascaded with the Viterbi algorithm in which post-decoding 
decisions from the decoder are fed back to the DFE. 

One alternative is to use the predictive DFE described in Section 10-3-3. In 
order to accommodate for the decoding delay as it affects the linear predictor, 
we introduce a periodic interleaver/deinterleaver pair that has the same delay 
as the Viterbi decoder and, thus, makes it possible to generate the appropriate 
error signal to the predictor as illustrated in the block diagram of Fig. 11-2-4. 
The novel way in which a predictive DFE can be combined with Viterbi 
decoding to equalize trellis-coded signals is described and analyzed by 
Eyuboglu (1988). This same idea has been carried over to the equalization of 
fading multipath channels by Zhou et al. (1988, 1990), but the structure of the 
DFE was modified to use recursive least-squares lattice-type filters, which 
provide faster adaptation to the time variations encountered in the channel. 


11-3 AN ADAPTIVE CHANNEL ESTIMATOR 
FOR ML SEQUENCE DETECTION 

The ML sequence detection criterion implemented via the Viterbi algorithm as 
embodied in the metric computation given by (10-1-23) and the probabilistic 
symbol-by -symbol detection algorithm described in Section 5-1-5 require 
knowledge of the equivalent discrete-time channel coefficients { f k }. To accom- 
modate a channel that is unknown or slowly time-varying, one may include a 
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FIGURE 11-3*1 


FIGURE 11*3-2 


Block diagram of method for estimating the channel 
characteristics for the Viterbi algorithm. 


input ■ 


(M 


Vilcrbi 
algorithm 
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Channel estimate 


• Output 


Channel 
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channel estimator connected in parallel with the detection algorithm, as shown 
in Fig. 11-3-1. The channel estimator, which is shown in Fig. 11-3-2 is identical 
in structure to the linear transversal equalizer discussed previously in Section 
11-1- In fact, the channel estimator is a replica of the equivalent discrete-time 
channel filter that models the intersymbol interference. The estimated tap 
coefficients, denoted by {/*}, are adjusted recursively to minimize the MSE 
between the actual received sequence and the output of the estimator. For 
example, the steepest-descent algorithm in a decision-directed mode of 
operation is 

f* + i = f* + (11-3-1) 

where f* is the vector of tap gain coefficients at the fcth iteration, A is the step 
size, E k ~v k - v k is the error signal, and I A denotes the vector of detected 
information symbols in the channel estimator at the ifcth iteration. 

We now show that when the MSE between v k and 0* is minimized, the 
resulting values of the tap gain coefficients of the channel estimator are the 
values of the discrete-time channel model. For mathematical tractability, we 
assume that the detected information sequence {7*} is correct, i.e., {/*.} is 


Adaptive transversal filter for estimating the channel dispersion. 
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identical to the transmitted sequence {/*}. This is a reasonable assumption 
when the system is operating at a low probability of error. Thus, the MSE 
between the received signal v k and the estimate v k is 



7(f) = E( 

- S I A j | ) 

(11-3-2) 

The tap coefficients {/)} that minimize 7(f) in (11-3-2) satisfy the set of N linear 
equations 


/V 1 
j 1 ) 

Ar = 0. 1 N - 1 

(11-3-3) 

where 





4> kl = d k =^f<f,j 

i~ o 

(11-3-4) 


From (11-3-3) and (11-3-4). we conclude that, as long as the information 
sequence {/J is uncorrelated, the optimum coefficients are exactly equal to the 
respective values of the equivalent discrete-time channel. It is also apparent 
that when the number of taps N in the channel estimator is greater than or 
equal to L+l, the optimum tap gain coefficients {/*} are equal to the 
respective values of the {/*}, even when the information sequence is correlated. 
■Subject to the above conditions, the minimum MSE is simply equal to the 
noise variance A',,. 

In the above discussion, the estimated information sequence at the output of 
the Viterbi algorithm or the probabilistic symbol-by-symbol algorithm was 
used in making adjustments of the channel estimator. For startup operation, 
one may send a short training sequence to perform the initial adjustment of the 
tap coefficients, as is usually done in the case of the linear transversal 
equalizer. In an adaptive mode of operation, the receiver simply uses its own 
decisions to form an error signal. 


11-4 RECURSIVE LEAST-SQUARES ALGORITHMS 
FOR ADAPTIVE EQUALIZATION 

The LMS algorithm that we described in Sections 11-1 and 11-2 for adaptively 
adjusting the tap coefficients of a linear equalizer or a DFE is basically a 
(stochastic) steepest-descent algorithm in which the true gradient vector is 
approximated by an estimate obtained directly from the data. 

The major advantage of the steepest-descent algorithm lies in its computa- 
tional simplicity. However, the price paid for the simplicity is slow conver- 
gence, especially when the channel characteristics result in an autocorrelation 
matrix F whose eigenvalues have a large spread, i.e., A max /A mill » 1. Viewed in 
another way, the gradient algorithm has only a single adjustable parameter for 
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controlling the convergence rate, namely, the parameter A. Consequently the 
slow convergence is due to this fundamental limitation. 

In order to obtain faster convergence, it is necessary to devise more complex 
algorithms involving additional parameters. In particular, if the matrix T is 

NxN and has eigenvalues Ai,A 2 A N , we may use an algorithm that 

contains N parameters— one for each of the eigenvalues. The optimum 
selection of these parameters to achieve rapid convergence is a topic of this 
section. 

In deriving faster converging algorithms, we shall adopt a least-squares 
approach. Thus, we shall deal directly with the received data in minimizing the 
quadratic performance index, whereas previously we minimized the expected 
value of the squared error. Put simply, this means that the performance index 
is expressed in terms of a time average instead of a statistical average. 

It is convenient to express the recursive least-squares algorithms in matrix 
form. Hence, we shall define a number of vectors and matrices that are needed 
in this development. In so doing, we shall change the notation slightly. 
Specifically, the estimate of the information symbol at time t, where t is an 
integer, from a linear equalizer is now expressed as 

7(0= £ c,(t~ I)w,_ y 

! = -X 

By changing the index j on c,(/ - 1) to run from j = 0 to j = N - 1 and 
.simultaneously defining 

y(0 = v t+K 

the estimate I(t) becomes 


7(0= 


j*0 


= C5v(/-l)Y„(0 


(11-4-1) 


where C^(r-l) and Y N (t) are, respectively, the column vectors of the 
equalizer coefficients c ; (r — 1), j = 0, 1, . . . , N - 1, and the input signals y(t~ 
j), j = 0, 1, 2, . . . , N — 1. 

Similarly, in the decision-feedback equalizer, we have tap coefficients c ; (t), 
/ = 0, 1, . . . , ,V - 1, where the first /C, + 1 are the coefficients of the feedfor- 
ward filter and the remaining K 2 = N - K } - 1 are the coefficients of the 
feedback filter. The data in the estimate 7(0 is u f+ *,, • ... , v,+i, 7 r _j, . . . , /,_k 2 , 
where 7,_ /t 1 ^K 2 , denote the decisions on previously detected symbols. In 

this development, we neglect the effect of decision errors in the algorithms. 
Hence, we assume that l=s/^/C 2 . For notational convenience, we 

also define 


y0 




(0 


(11-4-2) 
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Thus, 


Y.v(r)=[v(r) y[t- 1) ... y(t~N + \)]' 

= K + A', v, h-\ ■■■ W 


(11-4-3) 


11-4-1 Recursive Least-Squares (Kalman) Algorithm 

The recursive least-squares (RLS) estimation of /(f) may be formulated as 
follows. Suppose we have observed the vectors Y A (/i), n = 0, 1, . . . , t, and we 
wish to determine the coefficient vector C A/ (f) of the equalizer (linear or 
decision-feedback) that minimizes the time-average weighted squared error 

%n S = X \e N (n. t ) | 2 ( 1 1 -4-4) 

n =0 

where the error is defined as 

e *(n> 0 = /(«) ~ C*(0Y*(n) (11-4-5) 

and w represents a weighting factor 0< w < 1. Thus we introduce exponential 
weighting into past data, which is appropriate when the channel characteristics 
are time-variant. Minimization of gfc 5 with respect to the coefficient vector 
C„(0 yields the set of linear equations 

*M0Ca/( 0 = Dyv(f) (11-4-6) 

where R A (f) is the signal correlation matrix defined as 

t 

R iv(0 - X w'-"Y^n)Y» (11-4-7) 

n -0 

and D, v (f ) is the cross-correlation vector 

l 

D*(0= X »''-/(« )YR(/i) (11-4-8) 

n *0 

The solution of (11-4-6) is 

C,v(0 = R A l (f)D„(f) (11-4-9) 

The matrix R. v (/> is akin to the statistical autocorrelation matrix r A , while 
the vector D A (r) is akin to the cross-correlation vector % N , defined previously. 
We emphasize, however, that R*(r) is not a Toeplitz matrix. We also should 
mention that, for small values of t, R N (t) may be ill conditioned; hence, it is 
customary to initially add the matrix 5I/v to R A/ (f), where 5 is a small positive 
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constant and I N is the identity matrix. With exponential weighting into the 
past, the effect of adding Sly dissipates with time. 

Now suppose we have the solution (11-4-9) for time r — 1 , i.e., C^(/ - 1), 
and we wish to compute C N (t). It is inefficient and, hence, impractical to solve 
the set of N linear equations for each, new signal component that is received. 
To avoid this, we proceed as follows. First, R N {t) may be computed recursively 
as 

R/v(/)= - 1) + Y£(r)Y'„(0 (11-4-10) 


We call (11-4-10) the time-update equation for R^(/). 

Since the inverse of R/v(f) is needed in (11-4-9), we use the matrix-inverse 
identity 

ml 


R 


/V 


(O — U/vV-l)- 

w L 


R„\t - lJViKOYMR^fl - 1) - 

w+Y‘„(OR*'(/-im(o . 


(11-4-11) 


Thus R^'(f) may be computed recursively according to (11-4-11). 

For convenience, we define P „(t) = R*‘(r). It is also convenient to define an 
N-dimensional vector, called the Kalman gain vector, as 


where p N (t) is a scalar defined as 


M0 = V„(/)P w (/-1)YJ5(0 (11-4-13) 

With these definitions, (11-4-11) becomes 


p*(0 - ~ [Pv(t - 1) - K N (/)YU0PA/(f - 1)] 


(11-4-14) 


Suppose we postmultiply both sides of (11-4-14) by Y£(r). Then 
P w (i)Yfc(/) = ^[P„(t - l)Y^r)- K w (r)Y) v (i)P A ,(f - l)Y£(t)] 


= -;{[«' + Miv(0]Kw(0 - K,(/K(t)} 

wV 

= MO (11-4-15) 

Therefore, the Kalman gain vector may also be defined as P jV (r)Y A/ (r). 

Now we use the matrix inversion identity to derive an equation for 
obtaining C N (r) from C A ,(f - 1). Since 

C*(r) = P^f)D*(i) 

and 

D/v(0 = >vD N (r - 1) + l(t) YWO (11-4-16) 
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wc have 


C N (t) - - [P w (r - i)-K„(f)Y'*(r)P,,(r- I)|| t vD,(f- l) + W(0] 
w 

= P,(f-!)D v (r- l) + -/(r)P v <F- l)Yt(/) 

tv 

- K,U)Y' v (f)Pv(f- 1)D S (/- 1) 

- — /(f)K.v(OY' v (f)P.v(f - 1 )Y*(f) 
tv 

= c v (f - 1) +■ K v (f )[/{/) - Y'v(OC v (/ - 1)1 (11-4-17) 

Note that Y'v(/)C v (/ - 1) is the output of the equalizer at time t, i.e.. 

/(f) - Y v (f)C\(r - 1 ) (11-4-18) 

and 

e K U.t-l) = Ht)-l(t) = e. v (i) (11-4-19) 

is the error between the desired symbol and the estimate. Hence. C N (t) is 
updated recursively according to the relation 

Cv(f) = C iV (r - 1 ) + K ;V (f )e v (f ) ( 1 1-4-20) 

The residual MSE resulting from this optimization is 

l 

mm = 2 »■' " \Hn f - C'v(f)Dt(0 ( 1 1 -4-2 1 ) 

FI-0 

To summarize, suppose we have C. v (f-1) and P, v (/-1). When a new 
signal component is received, we have Y v (f). Then the recursive computation 
for the time update of C. v (f ) and P v (f) proceeds as follows: 

• compute output: 

/(/) = Y' v (/)C,v(f-l) 

• compute error: 

*.v(0 = /(0-f(0 

• compute Kalman gain vector: 

K (r) Py( f ~ 1 )V'v(f ) 

w + Y' v (f)P.v(f - l)Y*(r) 

• update inverse of the correlation matrix: 

P.v(0 = ~ [P w (/ - 1) - Kv(f)Y' v (f)P,v(f - 1)] 

• update coefficients: 

C^f) = C v (f-l) + K v (f)e v (f) 

= C„(/- l) + P. v (r)Y*(/)e iV (f) 


( 11 - 4 - 22 ) 
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Kalman and gradient algorithms. Number of iterations 


The algorithm described by (11-4-22) is called the RLS direct form or Kalman 
algorithm. It is appropriate when the equalizer has a transversal (direct-form) 
structure. 

Note that the equalizer coefficients change with time by an amount equal to 
the error «*(/) multiplied by the Kalman gain vector K*(0- Since K^(r) is 
jV-dimensional, each tap coefficient in effect is controlled by one of the 
elements of K,v(r). Consequently rapid convergence is obtained. In contrast, 
the steepest-descent algorithm, expressed in our present notation, is 

C„(0 = C*(r - 1) + AY£(0e,v(0 (11-4-23) 

and the only variable parameter is the step size A. 

Figure 11-4-1 illustrates the initial convergence rate of these two algorithms 
for a channel with fixed parameters f 0 ~ 0.26, /, = 0.93, f 2 = 0.26, and a linear 
equalizer with 11 taps. The eigenvalue ratio for this channel is A m „/A min = 11. 
All the equalizer coefficients were initialized to zero. The steepest-descent 
algorithm was implemented with A = 0.020. The superiority of the Kalman 
algorithm is clearly evident. This is especially important in tracking a 
time-variant channel. For example, the time variations in the characteristics of 
an (ionospheric) high-frequency (HF) radio channel are too rapid to be 
equalized by the gradient algorithm, but the Kalman algorithm adapts 
sufficiently rapidly to track such variations. 

In spite of its superior tracking performance, the Kalman algorithm 
described above have two disadvantages. One is its complexity. The second is 
its sensitivity to roundoff noise that accumulates due to the recursive 
computations. The latter may cause instabilities in the algorithm. 

The number of computations or operations (multiplications, divisions, and 
subtractions) in computing the variables in (11-4-22) is proportional to N 2 . 
Most of these operations are involved in the updating of P N (f). This part of the 
computation is also susceptible to roundoff noise. To remedy that problem, 
algorithms have been developed that avoid the computation of P N (f) according 
to (11-4-14). The basis of these algorithms lies in the decomposition of P v (0 in 
the form 


P*(O = S v (0A„(f)S'v(r) 


(11-4-24) 
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where S v (f) is a lower-triangular matrix whose diagonal elements are unity, 
and A/v(r) is a diagonal matrix. Such a decomposition is called a square-root 
factorization (see Bierman, 1977). This factorization is described in Appendix 
D. In a square-root algorithm, P N (t) is not updated as in (11-4-14) nor is it 
computed. Instead, the time updating is performed on S*(/) and A*(r). 

Square-root algorithms are frequently used in control systems applications 
in which Kalman filtering is involved. In digital communications, the square- 
root Kalman algorithm has been implemented in a decision-feedback-equalized 
PSK modem designed to transmit at high speed over HF radio channels with a 
nominal 3 kHz bandwidth. This algorithm is described in the paper by Hsu 
(1982). It has a computational complexity of 1.51V 2 +6.5N (complex-valued 
multiplications and divisions per output symbol). It is also numerically stable 
and exhibits good numerical properties. For a detailed discussion of square- 
root algorithms in sequential estimation, the reader is referred to die book by 
Bierman (1977). 

It is also possible to derive RLS algorithms with computational complexities 
that grow linearly with the number N of equalizer coefficients. Such algorithms 
are generally called fast RLS algorithms and have been described in the papers 
by Carayannis et al. (1983), Cioffi and Kailath (1984), and Slock and Kailath 
(1988). 


11-4-2 Linear Prediction and the Lattice Filter 

In Chapter 3, we considered the linear prediction of a signal, in the context of 
speech encoding. In this section, we shall establish the connection between 
linear prediction and a lattice filter. 

The linear prediction problem may be stated as follows: given a set of data 
y(t ~ 1). y(t - 2), ... , y(t -p), predict the value of the next data point y(t). 
The predictor of order p is 

9{t)^^a pk y{t-k) (11-4-25) 

Minimization of the MSE, defined as 

* P = E[y(t)-m] 2 

= E |y(r) - J) a pk y[f-k ) j (11 -4-26) 

with respect to the predictor coefficients {a pk } yields the set of linear equations 
f, a P*<t>( k - 0 = <K0, l = l,2,...,p (1 1-4-27) 

where 

<Hl)~E[y(t)y(t + l)) 

These are called the normal equations or the Yule-Walker equations. 
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The matrix <l> with elements <f>{k - 1) is a Toeplitz matrix, and, hence, the 
Levinson- Durbin algorithm described in Appendix A provides an efficient 
means for solving the linear equations recursively, starting with a first-order 
predictor and proceeding recursively to the solution of the coefficients for the 
predictor of order p. The recursive relations for the Levinson-Durbin 
algorithm are 


flu 


0 ( 1 ) 

0 ( 0 )’ 


% = 0< 0) 


a 


mm 




(11-4-28) 


&mk &m-\k — 1 m -k 

for m = 1, 2, . . . ,p, where the vectors A„,_[ and are defined as 
A m - 1 = [a m .) i a m ~i2 ••• 

0m-i = ~ 1) <f>(m-2) ... <*(1)]' 

The linear prediction filter of order m may be realized as a transversal filter 
with transfer function 

m 

A m (z) = 1 - 2 a m z~ k (11-4-29) 

* = i 

Its input is the data {y(r)} and its output is the error e(t) = y(f) - $(t). The 
prediction filter can also be realized in the form of a lattice, as we now 
demonstrate. 

Our starting point is the use of the Levinson-Durbin algorithm for the 
predictor coefficients a mk in (11-4-29). This substitution yields 

m — 1 

A,n(z ) 1 fam - 1 ft m ~ k)Z ^ 

*- 1 

*=1“ 2 a m~ikZ~ k 

*=i v * = i ' 

= A m _,(z) - a mmZ - m A m - x { Z - x ) (11-4-30) 

Thus we have the transfer function of the mth-order predictor in terms of the 
transfer function of the ( m - l)th-order predictor. 

Now suppose we define a filter with transfer function C m (z) as 

G m (z) = z- m A m {z~') (11-4-31) 

Then (11-4-30) may be expressed as 


A m (z ) A m _](z) o mfn z C? m _i(z) 


(11-4-32) 
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Note that G m -i(z) represents a transversal filter with tap coefficients 

-a m _ n ,l), while the coefficients of A m . l (z) 
are exactly the same except that they are given in reverse order. 

More insight into the relationship between A m (z) and G m (z) can be 
obtained by computing the output of these two filters to an input sequence 
y{t). Using z -transform relations, we have 

AM)Y(z) = An-^Yiz) ~ a mm z-'G n ^(z)Y(z) (11-4-33) 

We define the outputs of the filters as 

■ F m (z) = A m (z)Y(z) 

B m (z)=G m (z)Y(z ) 

Then (11-4-33) becomes 

F m (z) = F n -\ (2) ~ a mm z 'B m - x {z) 

In the time domain, the relation in (11-4-35) becomes 

Jm(0 fm -](0 tl mm b m -lit 1 )> ^ 1 

where 

LiO = yit) - X a mky{t ~ k) 

k = \ 
tn - ] 

K(t) = y(t~ m)~ ^ a mk y(t - m + k) 

k = 1 

To elaborate, f m (t) in (11-4-37) represents the error of an mth-order forward 
predictor, while b m (t) represents the error of an mth-order backward 
predictor. 

The relation in (11-4-36) is one of two that specifies a lattice filter. The 
second relation is obtained from G m (z) as follows: 

G m {z) = Z- m A m {z~') 

= z- n [A m - i (z-')-a mm z m A m - l (z)] 

= z~ l G m _ I (z)-a mm A m ^ t (z) (11-4-39) 

Now, if we multiply both sides of (11-4-39) by T(z) and express the result in 
terms of F m (z) and B m (z) using the definitions in (11-4-34), we obtain 

B m (z) = z 1 B m - l (z) - a mm F m .. l (z) (11-4-40) 

By transforming (11-4-40) into the time domain, we obtain the second relation 
that corresponds to the lattice filter, namely, 

b,«(0 — b m -j(t — 1) — i(f), tn ^ 1 


(11-4-34) 

(11-4-35) 

(11-4-36) 

(11-4-37) 

(11-4-38) 


(11-4-41) 
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FIGURE 11*4-2 



<«) (fc) 

A lattice filter. 


The initial condition is 

/o(0 = b 0 (t) = y(t) (11-4-42) 

The lattice filter described by the recursive relations in (11-4-36) and (11-4-41) 
is illustrated in Fig. 11-4-2. Each stage is characterized by its own multiplication 
factor {an}, i = 1, 2, . . . , m, which is defined in the Levinson-Durbin algorithm. 
The forward and backward errors f m (t) and b m (t) are usually called the 
residuals. The mean square value of these residuals is 


= £[/«(0] = E[b 2 m (t)\ (11-4-43) 

8 m is given recursively, as indicated in the Levinson-Durbin algorithm, by 

id -a 2 mm ) 


= $>11(1 -a?) (11-4-44) 

i= l 

where = 0(0). 

The residuals {^,(r)} and {b m (t)} satisfy a number of interesting properties, 
as described by Makhoul (1978). Most important of these are the orthogonality 
properties 


E[b m (t)b n (t)] = % m S m „ 
E[f m (t + m)f n (t «)] = % m S mn 


(11-4-45) 


Furthermore, the cross-correlation between f„(t ) and b„(t) is 


E[f m (')b n (t)\ 


(m^n) 
0 (m < rt) 


m, n 3= 0 


(11-4-46) 


As a consequence of the orthogonality properties of the residuals, the 
different sections of the lattice exhibit a form of independence that allows us to 
add or delete one or more of the last stages without affecting the parameters of 
the remaining stages. Since the residual mean square error % m decreases 
monotonically with the number of sections, % m can be used as a performance 
index in determining where the lattice should be terminated. 

From the above discussion, we observe that a linear prediction filter can be ' 
implemented either as a linear transversal filter or as a lattice filter. The lattice 
filter is order-recursive, and, as a consequence, the number of sections it 
contains can be easily increased or decreased without affecting the parameters 
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of the remaining sections. In contrast, the coefficients of a transversal filter 
obtained on the basis of the RLS criterion are interdependent. This means that 
an increase or a decrease in the size of the filter results in a change in all 
coefficients. Consequently, the Kalman algorithm described in Section 11-4-1 is 
recursive in time but not in order. 

Based on least-squares optimization, RLS lattice algorithms have been 
developed whose computational complexity grow linearly with the number N 
of filter coefficients (lattice stages). Hence, the lattice equalizer structure is 
computationally competitive with the direct-form fast RLS equalizer algo- 
rithms. RLS lattice algorithms are described in the papers by Morf et al. 
(1973), Satorius and Alexander (1979), Satorius and Pack (1981), Ling and 
Proakis (1984), and Ling et al. (1986). 

RLS lattice algorithms have the distinct feature of being numerically robust 
to round-off error inherent in digital implementations of the algorithm. A 
treatment of their numerical properties may be found in the papers by Ling et 
at. (1984, 1986). 


11-5 SELF-RECOVERING (BLIND) EQUALIZATION 

* In the conventional zero-forcing or minimum MSE equalizers, we assumed that 

a known training sequence is transmitted to the receiver for the purpose of 
initially adjusting the equalizer coefficients. However, there are some applica- 
tions, such as multipoint communication networks, where it is desirable for the 
receiver to synchronize to the received signal and to adjust the equalizer 
without having a known training sequence available. Equalization techniques 
based on initial adjustment of the coefficients wijhout the benefit of a training 
sequence are said to be self -recovering or blind. 

Beginning with the paper by Sato (1975), three different classes of adaptive 
blind equalization algorithms have been developed over the past two decades. 
One class of algorithms is based on steepest descent for adaptation of the 
equalizer. A second class of algorithms is based on the use of second- and 
higher-order (generally, fourth-order) statistics of the received signal to 
estimate the channel characteristics and to design the equalizer. More recently, 
a third class of blind equalization algorithms based on the maximum-likelihood 
criterion have been investigated. In this section, we briefly describe these 
approaches and give several relevant references to the literature. 


11-5-1 Blind Equalization Based on Maximum-Likelihood 
Criterion 

It is convenient to use the equivalent, discrete-time channel model described in 
Section 10-1-2. Recall that the output of this channel model with ISI is 

L 

n ~ 2 fklft-k 7 " V n 
k - 0 


V, 


(11-5-1) 
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where {/*} are the equivalent discrete-time channel coefficients, {!„} represents 
the information sequence, and { 17 ,} is a white gaussian noise sequence. 

For a block of N received data points, the (joint) probability density 
function of the received data vector v = [u, v 2 ... U/v]' conditioned on 
knowing the impulse response vector t-[f 0 ... f L ]' and the data vector 

!=[/, / 2 ... /„]' is 


piy I f, i) = 


2\N 


(2na 2 ) 


exp 


/ . 1 N 


i m 

0 


n-k 


(11-5-2) 


The joint maximum-likelihood estimates of f and I are the values of these 
vectors that maximize the joint probability density function p(v | f, I) or, 
equivalently, the values of f and I that minimize the term in the exponent. 
Hence, the ML solution is simply the minimum over f and I of the metric 


DM (l, f) 



L 2 

2 


k =0 


]|v-Af|| 2 


(11-5-3) 


where the matrix A is called the data matrix and is defined as 


'/, 0 0 

I 2 /, 0 


0 _ 

0 

0 


'/V 


*N— 1 


Ov-2 


'N-L 


(11-5-4) 


We make several observations. First of all, we note that when the data 
vector I (or the data matrix A) is known, as is the case when a training 
sequence is available at the receiver, the ML channel impulse response 
estimate obtained by minimizing (11-5-3) over f is 

f W z(I) = (A'A)- , A'v (11-5-5) 

On the other hand, when the channel impulse response f is known, the 
optimum ML detector for the data sequence I performs a trellis search (or tree 
search) by utilizing the Viterbi algorithm for the ISI channel. 

When neither I nor f are known, the minimization of the performance index 
DM(l,V) may be performed jointly over I and f. Alternatively, f may be 
estimated from the probability density function p(v j f), which may be obtained 
by averaging p(v,f 1 1) over all possible data sequences. That is. 


= 2p OW’ 0 ) 


(11-5-6) 
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where / 3 (l lm> ) is the probability of the sequence I = I*" 0 , for m = 1, 2, . . . , M N 
and M is the size of the signal constellation. 


Channel Estimation Based on Average over Data Sequences As indi- 
cated in the above discussion, when both I and f are unknown, one approach is 
to estimate the impulse response f after averaging the probability density 
p(v, 1 1 f) over .11 possible data sequences. Thus, we have 




r i / 

II V- A ( ' n) f|| 2 \] 

l(2x*y exp \ 

2 or 2 /. 


) / > (I <m) ) 


(11-5-7) 


Then, the es' innate of f that maximizes p(v J f) is the solution of the equation 

" (11-5-8) 

(A'^'A^’f- A <m S0 exp f- !!V ~ 2 y' )f112 ) = 0 

Hence, the i stimate of f may be expressed as 

f= fs / , (l' m, )A |m) 'A ( '" , g(v, A (m, > f)j 

(11-5-9) 

X^P(f”')g(v, A ,m) , f)A (m), v 


where the functi on g(v, is defined as 

g(v, A (m> , f) - exp (- — f|1 ) (11-5-10) 

The resulting solution for the optimum f is denoted by t ML . 

Equation (11-5-9) is a nonlinear equation for the estimate of tne channel 
impulse response, given the received signal vector v. It is generally difficult to 
obtain the optimum solution by solving (11-5-9) directly. On the other hand, it 
is relatively simple to devise a numerical method that solves for t ML 
recursively. Specifically, we may write 

f* +,) = [s />(I (m) )A ( "’>'A ( ’" , g(v, A tw) , !<*>)] 

x 2 /’(I ( '" , )g(v, A (m) , f (i) )A [m) 'v (11-5-11) 

m 

Once is obtained from the solution of (11-5-9) or (11-5-11), we may 
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simply use the estimate in the minimization of the metric DM( I. f M , ), given b\ 
(11-5-3), over all the possible data sequences. Thus, is the sequence ] thus 
mimimizes DM(I,f Mt ), i.e., 

min DM {l, f ML ) - min j|v - Af Wi .|| 2 (11-5-12) 

i i 

We know that the Viterbi algorithm is the computationally efficient algorithm 
for performing the minimization of DM( I, t ML ) over I. 

This algorithm has two major drawbacks. First, the recursion for t LM given 
by (11-5-11) is computationally intensive. Second, and, perhaps, more impor- 
tantly, the estimate f ML is not as good as the maximum-likelihood estimate 
fjwz-(I) that is obtained when the sequence I is known. Consequently, the error 
rate performance of the blind equalizer (the Viterbi algorithm) based on the 
estimate f ML is poorer than that based on Next, we consider joint 

channel and data estimation. 


Joint Channel and Data Estimation Here, we consider the joint optimiza- 
tion of the performance index DM( 1,0 given by (11-5-3). Since the elements 
of the impulse response vector f are continuous and the elements of the data 
vector I are discrete, one approach is to determine the maximum-likelihood 
estimate of f for each possible data sequence and, then, to select the data 
sequence that minimizes DM(l , f) for each corresponding channel estimate. 
Thus, the channel estimate corresponding to the»mth data sequence I ,m) is 

f«JI <m ') = (A (m) 'A < " l, ) - 1 A ( '" ) 'v. (1 1-5-13) 

For the with data sequence, the metric DM( I,f) becomes 

DM{t m \ U(I ,m) )) = || v - A<"V(I (m) )l | 2 (11-5-14) 

Then, from the set of M N possible sequences, we select the data sequence that 
minimizes the cost function in (11-5-14), i.e., we determine 

min f ML (l (m) )) (11-5-15) 


The approach described above is an exhaustive computational search 
method with a computational complexity that grows exponentially with the 
length of the data block. We may select N ~ L, and, thus, we shall have one 
channel estimate for each of the M^~ surviving sequences. Thereafter, we may 
continue to maintain a separate channel estimate for each surviving path of the 
Viterbi algorithm search through the trellis. 

A similar approach has been proposed by Seshadri (1991). In essence. 
Seshadn s algorithm is a type of generalized Viterbi algorithm (GVA) that 
retains K > 1 best estimates of the transmitted data sequence into each state 
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of the trellis and the corresponding channel estimates. In Seshadri’s GVA, the 
search is identical to the conventional VA from the beginning up to the L stage 
of the trellis, i.e., up to the point where the received sequence (u,, v 2 , . . . , v L ) 
has been processed. Hence, up to the L stage, an exhaustive search is 
performed. Associated with each data sequence I (m> , there is a corresponding 
channel estimate From this stage on, the search is modified, to retain 

K=e 1 surviving sequences and associated channel estimates per state instead of 
only one sequence per state. Thus, the GVA is used for processing the 
received signal sequence {u„, n ^ L + 1}. The channel estimate is updated 
recursively at each stage using the LMS algorithm to further reduce the 
computational complexity. Simulation results given in the paper by Seshadri 
(1991) indicate that this GVA blind equalization algorithm performs rather 
well at moderate signal-to-noise ratios with K = 4. Hence, there is a modest 
increase in the computational complexity of the GVA compared with that for 
the conventional VA. However, there are additional computations involved 
with the estimation and updating of the channel estimates f(I (,n) ) associated 
with each of the surviving data estimates. 

An alternative joint estimation algorithm that avoids the least-squares 
computation for channel estimation has been devised by Zervas et al. (1991). 
In this algorithm, the order for performing the joint minimization of the 
performance index DM{1, f) is reversed. That is, a channel impulse response, 
say f=f I) is selected and then the conventional VA is used to find the 
optimum sequence for this channel impulse response. Then, we may modify f 11 
in some manner to f <2) = f'-f Af* 1 ’ and repeat the optimization over the data 
sequences {I*"' 1 }. 

Based on this general approach, Zervas developed a new ML blind 
equalization algorithm, which is called a quantized-channel algorithm. The 
algorithm operates over a grid in the channel space, which becomes finer and 
finer by using the ML criterion to confine the estimated channel in the 
neighborhood of the original unknown channel. This algorithm leads to an 
efficient parallel implementation, and its storage requirements are only those 
of the VA. 

11-5-2 Stochastic Gradient Algorithm 

Another class of blind equalization algorithms are stochastic-gradient iterative 
equalization schemes that apply a memoryless nonlinearity in the output of a 
linear FIR equalization filter in order to generate the “desired response” in 
each iteration. 

Let us begin with an initial guess of the coefficients of the optimum 
equalizer, which we denote by {c„}. Then, the convolution of the channel 
response with the equalizer response may be expressed as 

{c„}* {/,} = {«„} + {<■„} (11-5-16) 

where {6,,} is the unit sample sequence and { e „} denotes the error sequence 
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that results from our initial guess of the equalizer coefficients. If we convolve 
the equalizer impulse response with the received sequence we obtain 

~ {Q * { fn } * {C„} + {q n } ★ {c„} 

= {/„} ★ ({S„} + {<?„}) + {!)„} ★ {c„} 

= U«} + {U ★ {<?„} + {tj«} ★ {c n } (11-5-17) 

The term {/„} in (11-5-17) represents the desired data sequence, the term 
{/„}-*{e„} represents the residual ISI, and the term {??„}★ {c„} represents the 
additive noise. Our problem is to utilize the deconvolved sequence (/„) to find 
the “best” estimate of a desired response, denoted in general by {d n }. In the 
case of adaptive equalization using a training sequence, {d„} — {!„}. In a blind 
equalization mode, we shall generate a desired response from {/„}. 

The mean square error (MSE) criterion may be employed to determine the 
“best” estimate of {/„} from the observed equalizer output {/„}. Since the 
transmitted sequence {/„} has a nongaussian pdf, the MSE estimate is a 
nonlinear transformation of {/„}. In general, the “best” estimate {d„} is given 
by 

d„ - g(l„) (memoryless) 

(11-5-18) 

d» - gUn, J„_ m ) (mth-order memory) 

where g( ) is a nonlinear function. The sequence {d n } is then used to generate 
an error signal, which is fed back into the adaptive equalization filter, as shown 
in Fig. 11-5-1. 

A well-known classical estimation problem is the following. If the equalizer 
output 1„ is expressed as 

L = I„ + fi n (11-5-19) 

where is assumed to be zero-mean gaussian (the central limit theorem may 



FIGURE 11-5-1 Adaptive blind equalization with stochastic 
gradient algorithms. 
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TABLE 11-5-1 STOCHASTIC GRADIENT ALGORITHMS FOR BLIND EQUALIZATION 


Equalizer tap coefficients 
Received signal sequence 
Equalizer output sequence 
Equalizer error sequence 
Tap coefficient update equation 


{c„, 0=s« «sN - 1} 

k) 

U„1 = (vj * {c,,} 

{e„) = gO„)~L 

«.,-i =c„ + Av*e„ 


Algorithm 

Nonlinearity: £(/„) 

Godard 

im»>. 

Sato 


Benveniste-Goursat 

l, + - !„) +■ * 2 |/„ - 1„\ u csg n (/„) - /„], k, and 

k 2 are positive constants 

Stop-and-Go 

K + - h) + 2&(L - /J* (A, B) = (2, 0), (1, 1). 

(1, -1), or (0.0), depending on the signs of decision- 
directed error /„ -7, and the error ( csgn (/„) - 7„ 


be invoked here for the residual ISI and the additive noise), {/„} and {-ij,,} are 
statistically independent, and {/„} are statistically independent and identically 
distributed random variables, then the MSE estimate of {/„} is 

d„ = E(!„ j 7„) (11-5-20) 

which is a nonlinear function of the equalizer output when {/„} is nongaussian. 

Table 11-5-1 illustrates the general form of existing blind equalization 
algorithms that are based on LMS adaptation. We observe that the basic 
difference among these algorithms lies in the choice of the memoryless 
nonlinearity. The most widely used algorithm in practice is the Godard 
algorithm, sometimes also called the constant-modulus algorithm (CMA). 

It is apparent from Table 11-5-1 that the output sequence { d n } obtained by 
taking a nonlinear function of the equalizer output plays the role of the desired 
response or a training sequence. It is also apparent that these algorithms are 
simple to implement, since they are basically LMS-type algorithms. As such, 
we expect that the convergence characteristics of these algorithms will depend 
on the autocorrelation matrix of the received data {u„} 

With regard to convergence, the adaptive LMS-type algorithms converge in 
the mean when 

£Kg*(/„)] = E[v n I*\ (11-5-21) 

and, in the mean square sense, when (superscript H denotes the conjugate 
transpose) e 


£ [cXg*(/„)] = E[c?vJ*) 
= E[ |/„| 2 ] 


(11-5-22) 
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FIGURE 11-5-2 


Therefore, it is required that the equalizer output {/„} satisfy (11-5-22). Note 
that (11-5-22) states that the autocorrelation of {/„} (the right-hand side) equals 
the cross-correlation between /„ and a nonlinear transformation of /„ (left-hand 
side). Processes that satisfy this property are called Bussgang (1952), as named 
by Bellini (1986). In summary, the algorithms given in Table 11-5-1 converge 
when the equalizer output sequence 1„ satisfies the Bussgang property. 

The basic limitation of stochastic gradient algorithms is their relatively slow 
convergence. Some improvement in the convergence rate can be achieved by 
modifying the adaptive algorithms from LMS-type to recursive-least-square 
(RLS) type. 

Godard Algorithm As indicated above, the Godard blind equalization 
algorithm is a steepest-descent algorithm that is widely used in practice when a 
training sequence is not available. Let us describe this algorithm in more detail. 

Godard considered the problem of combined equalization and carrier phase 
recovery and tracking. The carrier phase tracking is performed at baseband, 
following the equalizer as shown in Fig. 11-5-2. Based on this structure, we 
may express the equalizer output as 

A 

h= 2 (11-5-23) 

n=* - K 

and the input to the decision device as l k exp(— i4> k ), where <f> k is the carrier 
phase estimate in the kth symbol interval. 

If the desired symbol were known, we could form the error signal 

£ k = I k - ? k e- J +* (11-5-24) 

and minimize the MSE with respect to 4> k and {c„}, i.e., 

min E(\I k (11-5-25) 


Godard scheme for combined adaptive (blind) equalization and carrier phase tracking. 


COS (D t J 
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This criterion leads us to use the LMS algorithm for recursively estimating C 
and <£*. The LMS algorithm based on knowledge of the transmitted sequence 
is 


C* + , = t t + A ,.(/* - he (1 1-5-26) 

& . , = 4>k + A* Im (/*/*V*M (11-5-27) 


where A, and A^ are the step-size parameters for the two recursive equations. 
Note that these recursive equations are coupled together. Unfortunately, these 
equations will not converge, in general, when the desired symbol sequence {/*} 
is unknown. 

The approach proposed by Godard is to use a criterion that depends on the 
amount of intersymbol interference at the output of the equalizer but one that 
is independent of the QAM signal constellation and the carrier phase. For 
example, a cost function that is independent of carrier phase and has the 
property that its minimum leads to a small MSE is 

C, ip) = E(\I k \ p -\[ k \ p ) 2 (11-5-28) 


where p is a positive and real integer. Minimization of G ir) with respect to the 
equalizer coefficients results in the equalization of the signal amplitude only. 
Based on this observation, Godard selected a more general cost function, 
called the dispersion of order p , defined as 

D tp) = E(\h\ p ~ R P ) 2 (11-5-29) 

where R p is a positive real constant. As in the case of G (p) , we observe that 
D ie ' is independent of the carrier phase. 

Minimization of D ip) with respect to the equalizer coefficients can be 
performed recursively according to the steepest-descent algorithm 


C * «. i ~ C* A„ 


dD (p) 

dC k 


(11-5-30) 


where k p is the step-size parameter. By differentiating D ip) and dropping the 
expectation operation, we obtain the following LMS-type algorithm for 
adjusting the equalizer coefficients: 

c* + , = c* + a P m k i hr 2 (R P - 1 hn 01-5-31) 


where A p is the step-size parameter and the optimum choice of R p is 


_ £(|4I 2/1 ) 

' £(!/*!") 


(11-5-32) 


As expected, the recursion in (11-5-31) for C* does not require knowledge 
of the carrier phase. Carrier phase tracking may be carried out in a 
decision-directed mode according to (11-5-27). 
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Of particular importance is the case p - 2, which leads to the relatively 
simple algorithm 

C* H = C* + A p V^(/? 2 -|7*| 2 ) (11 5 33) 

£* + , = <£* + A* Im (jJte* k ) 

where l k is the output decision based on /*, and 


, _ £(I4I 4 ) 

2 E(\i k \ 2 ) 


(11-5-34) 


Convergence of the algorithm given in (11-5-33) was demonstrated in the 
paper by Godard (1980). Initially, the equalizer coefficients were set to zero 
except for the center (reference) tap, which was set according to the condition 


kol - > 


E M 

2 | xot 2 [ E (|/*| 2 )] 2 


(11-5-35) 


which is sufficient, but not necessary, for convergence of the algorithm. 
Simulation results performed by Godard on simulated telephone channels with 
typical frequency response characteristics and transmission rates of 7200- 
12 000 bits/s indicate that the algorithm in (11-5-31) performs well and leads to 
convergence in 5000-20000 iterations, depending on the signal constellation. 
Initially, the eye pattern was closed prior to equalization. The number of 
iterations required for convergence is about an order of magnitude greater 
than the number required to equalize the channels with a known training 
sequence. No apparent difficulties were encountered in using the decision- 
directed phase estimation algorithm in (11-5-33) from the beginning of the 
equalizer adjustment process. 


11-5-3 Blind Equalization Algorithms Based on Second- and 
Higher-Order Signal Statistics 

It is well known that second-order statistics (autocorrelation) of the received 
signal sequence provide information on the magnitude of the channel 
characteristics, but not on the phase. However, this statement is not correct if 
the autocorrelation function of the received signal is periodic, as is the case 
for a digitally modulated signal. In such a case, it is possible to obtain a 
measurement of the amplitude and the phase of the channel from the received 
signal. This cyclostationarity property of the received signal forms the basis for 
a channel estimation algorithm devised by Tong et al. (1993). 

It is also possible to estimate the channel response from the received signal 
by using higher-order statistical methods. In particular, the impulse response of 
a linear, discrete-time-invariant system can be obtained explicitly from 
cumulants of the received signal, provided that the channel input is nongaus- 
sian. We describe the following simple method for estimation of the channel 
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impulse response from fourth-order cumulants of the received signal sequence. 
The fourth-order cumulant is defined as 


t'K. v k v k ((.,,) = c v {m, n, l ) 

= E(v - v k v k ,„v k w ) 

- E(v k v k 

+ m )E(v k 

F/t V k ,,) 

- E(v k v k ,„)E( v k f w Vk + /) 

- + (11-5-36) 

(The fourth-order cumulant of a gaussian signal process is zero.) Consequently, 
it follows that 

‘X 

c r (m, n, l) = c(I k , I k I ktl ) 2 fj k+m f k+ nfk t i (H-5-37) 

k --- 0 


For a statistically independent and identically distributed input sequence {/„} 
to the channel, c(I k , I k + ,„, l k f ,„ I k+I ) = k, a constant, called the kurtosis. Then, 
if the length of the channel response is L + 1, we may let tn = n = f ~ — L so 
that 

-L, ~L) — kf t .f» (11-5-38) 

Similarly, if we let m = 0, n = L and I = p, we obtain 

c r (0, L, p) = kf, flf p (11-5-39) 

If we combine (11-5-38) and (11-5-39), we obtain the impulse response within a 
scale factor as 


fp=fo 


c r (Q. L,p) 
c,{-L, -L, -L) 


p- 1,2 L 


(11-5-40) 


The cumulants c r (m, n, l) are estimated from sample averages of the received 
signal sequence {u,,}- 

Another approach based on higher-order statistics is due to Hatzinakos and 
Nikias (1991). They have introduced the first polyspectra-based adaptive blind 
equalization method named the tricepstrum equalization algorithm (TEA). This 
method estimates the channel response characteristics by using the complex 
cepstrum of the fourth-order cumulants (tricepstrum) of the received signal 
sequence {u„}. TEA depends only on fourth-order cumulants of { v „ } and is 
capable of separately reconstructing the minimum-phase and maximum-phase 
characteristics of the channel, The channel equalizer coefficients are then 
computed from the measured channel characteristics. The basic approach used 
in TEA is to compute the tricepstrum of the received sequence {u„}, which is 
the inverse (three-dimensional) Fourier transform of the logarithm of the 
trispectrum of {v„}. (The trispectrum is the three-dimensional discrete Fourier 
transform of the fourth-order cumulant sequence c r (m, n, /)). The equalizer 
coefficients are then computed from the cepstral coefficients. 
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By separating the channel estimation from the channel equalization, it is 
possible to use any type of equalizer for the 1SI, i.e., either linear, or 
decision-feedback, or maximum-likelihood sequence detection. The major 
disadvantage with this class of algorithms is the large amount of data and the 
inherent computational complexity involved in the estimation of the higher- 
order moments (cumulants) of the received signal. 

In conclusion, we have provided an overview of three classes of blind 
equalization algorithms that find applications in digital communications. Of the 
three families of algorithms described, those based on the maximum-likelihood 
criterion for jointly estimating the channel impulse response and the data 
sequence are optimal and require relatively few received signal samples for 
performing channel estimation. However, the computational complexity of the 
algorithms is large when the ISI spans many symbols. On some channels, such 
as the mobile radio channel, where the span of the ISI is relatively short, these 
algorithms are simple to implement. However, on telephone channels, where 
the ISI spans many symbols but is usually not too severe, the LMS-type 
(stochastic gradient) algorithms are generally employed. 


11-6 BIBLIOGRAPHICAL NOTES AND REFERENCES 

Adaptive equalization for digital communications was developed by Lucky 
(1965, 1966). His algorithm was based on the peak distortion criterion and led 
to the zero-forcing algorithm. Lucky's work was a major breakthrough, which 
led to the rapid development of high-speed modems within five years of 
publication of his work. Concurrently, the LMS algorithm was devised by 
Widrow (1966), and its use for adaptive equalization for complex-valued 
(in-phase and quadrature components) signals was described and analyzed in a 
tutorial paper by Proakis and Miller (1969). 

A tutorial treatment of adaptive equalization algorithms that were de- 
veloped during the period 1965-1975 is given by Proakis (1975). A more recent 
tutorial treatment of adaptive equalization is given in the paper by Qureshi 
(1985). The major breakthrough in adaptive equalization techniques, beginning 
with the work of Lucky in 1965 coupled with the development of trellis-coded 
modulation, which was proposed by Ungerboeck and Csajka (1976), has led to 
the development of commercially available high speed modems with a 
capability of speeds of 9600-28 800 bits/s on telephone channels. 

The use of a more rapidly converging algorithm for adaptive equalization 
was proposed by Godard (1974). Our derivation of the RLS (Kalman) 
algorithm, described in Section 11-4-1, follows the approach outlined by 
Picinbono (1978). RLS lattice algorithms for general signal estimation applica- 
tions were developed by Morf et at. (1977, 1979). The applications of these 
algorithms have been investigated by several researchers, including Makhoul 
(1978), Satorius and Pack (1981), Satorius and Alexander (1979), and Ling and 
Proakis (1982, 1984a~c, 1985). The fast RLS Kalman algorithm for adaptive 
equalization was first described by Falconer and Liung (1978). The above 
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references are just a few of the important papers that have been published on 
RLS algorithms for adaptive equalization and other applications. 

Sato’s (1975) original work on blind equalization was focused on PAM 
(one-dimensinal) signal constellations. Subsequently it was generalized to 
two-dimensional and multidimensional signal constellations in the algorithms 
devised by Godard (1980), Benveniste and Goursat (1984), Sato (1986), 
Foschini (1985), Picchi and Prati (1987), and Shalvi and Weinstein (1990). 
Blind equalization methods based on the use of second- and higher-order 
moments of the received signal were proposed by Hatzinakos and Nikias 
(1991) and Tong et al. (1994). The use of the maximum-likelihood criterion for 
joint channel estimation and data detection has been investigated and treated 
in papers by Seshadri (1991), Ghosh and Weber (1991), Zervas et al. (1991) 
and Raheli et al. (1995). Finally, the convergence characteristics of stochastic 
gradient blind equalization algorithms have been investigated by Ding (1990), 
Ding et al. (1989), and Johnson (1991). 


PROBLEMS 


11-1 An equivalent discrete-time channel with white gaussian noise is shown in Fig. 

PI 1-1. 

a Suppose we use a linear equalizer to equalize the channel. Determine the tap 
coefficients c_,, c„, c, of a three-tap equalizer. To simplify the computation, let 
the AWGN be zero. 

b The tap coefficients of the linear equalizer in (a) are determined recursively via 
the algorithm 

C* +i — — Ag*, C* = [e_ u Co* cu]' 

where g* = TC* - b is the gradient vector and A is the step size. Determine the 
range of values of A to ensure convergence of the recursive algorithm. To 
simplify the computation, let the AWGN be zero. 

c Determine the tap weights of a DFE with two feedforward taps and one 
feedback gap. To simplify the computation, let the AWGN be zero. 

11-2 Refer to Problem 10-18 and answer the following questions. 



FIGURE Pll-1 
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a Determine the maximum value of A that can be used to ensure that the 
equalizer coefficients converge during operation in the adaptive mode, 
b What is the variance of the self-noise generated by the three-tap equalizer when 
operating in an adaptive mode, as a function of A? Suppose it is desired !o limit 
the variance of the self-noise to 10% of the minimum MSE for the three-tap 
equalizer when /V„ = 0. 1. What value of A would you select? 
c If the optimum coefficients of the equalizer are computed recursively by the 
method of steepest descent, the recursive equation can be expressed in the form 

C<„. ,j = (1 — AF)C,„ t + A£ 

where 1 is the identity matrix. The above represents a set of three coupled 
first-order difference equations. They can be decoupled by a linear transforma- 
tion that diagonalizes the matrix F. That is, T = UAU' where A is the diagonal 
matrix having the eigenvalues of T as its diagonal elements and U is the 
(normalized) modal matrix that can be obtained from your answer to 10-18(b). 
Let C’ = U'C and determine the steady-state solution for C'. From this, evaluate 
C = {U') 'C‘ = VC and, thus, show that your answer agrees with the result 
obtained in 10-18(a). 

11*3 When a periodic pseudo-random sequence of length N is used lo adjust the 
coefficients of an A'-tap linear equalizer, the computations cai> be performed 
efficiently in the frequency domain by use of the discrete Fourier transform 
(DFT). Suppose that {>,,} is a sequence of N received samples (taken at the symbol 
rate) at the equalizer input. Then the computation of the equalizer coefficients is 
performed as follows. 

a Compute the DFT of one period of the equalizer input sequence {v„j, i.e., 
h Compute the desired equalizer spectrum 


c, 


X>Yt 

ini 2 * 


*=0, 1 N-\ 


where {.¥)} is the precomputed DFT of the training sequence, 
c Compute the inverse DFT of {C*} to obtain the equalizer coefficients {c„}. Show 
that this procedure in the absence of noise yields an equalizer whose frequency 
response is equal to the frequency response of the inverse folded channel 
spectrum at the N uniformly spaced frequencies f k = k/NT, k= 0, 1, .... Af - 1. 

11-4 Show that the gradient vector in the minimization of the MSE may be expressed as 


G* = -E(e t \t) 

where the error e k ~ l k - /*, and the estimate of G*. i.e.. 


G* = -e k \* 

satisfies the condition that £(G*) = G*. 

11-5 The tap-leakage LMS algorithm proposed in the paper by Gitlin et al. (1982) may 
be expressed as 


C. s {n + 1) = wC N {n) + Ae(rt)V$(n) 
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where 0<w<l, A is the step size, and \ N (n) is the data vector at time n. 
Determine the condition for the convergence of the mean value of C*(n). 

11-6 Consider the random process 

x(n) = gv{n) + w(n), n =0, 1 M — 1 

where v(rt) is a known sequence, g is a random variable with E(g) = 0, and 
E(g 2 ) = G. The process w(/i) is a white noise sequence with 


7ww(m)= oia, 


Determine the coefficients of the linear estimator for g, that is, 

1=2 )■*(«) 

n =0 

that minimize the mean square error 

11-7 A digital transversal filter can be realized in the frequency-sampling form with 
system function (see Problem 10-25) 


H(z) = 


1 ~z 

M 


-MM 1 
-2 


fc-0 


H k 

1 _ e n*ktn z - i 


= H < (z)H 2 (z) 


where H,{z) is the comb filter, H 2 (z) is the parallel bank of resonators, and {H k } 
are the values of the discrete Fourier transform (DFT). 

a Suppose that this structure is implemented as an adaptive filter using the LMS 
algorithm to adjust the filter (DFT) parameters {//*}. Give the time-update 
equation for these parameters. Sketch the adaptive filter structure, 
b Suppose that this structure is used as an adaptive channel equalizer in which the 
desired signal is 

,, , V . Ink 

d\n) = 2s A k cos io k n, a> k ~ —— 
l> M 

With this form for the desired signal, what advantages are there in the LMS 
adaptive algorithm for the DFT coefficients {H k } over the direct-form structure 
with coefficients {h(n))l (see Proakis, 1970). 

11-8 Consider the performance index 


J=h 2 + 40h +28 

Suppose that we search for the minimum of J by using the steepest-descent 
algorithm 

h(n + 1) = h(n ) - jA g(n) 

where g(n) is the gradient. 

a Determine the range of values of A that provides an overdamped system for the 
adjustment process. 

b Plot the expression for J as a function of n for a value of A in this range. 
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11-9 Determine the coefficients a, and a 2 for the linear predictor shown in Fig. PI 1-9. 
given that the autocorrelation y A Jm) of the input signal is 

y, r (m) = b , '"\ 0< 6 < 1 

11-11) Determine the lattice filter and its optimum reflection coefficients corresponding to 
the linear predictor in Problem 11-9. 

11-11 Consider the adaptive FIR filter shown in Fig. PI 1-11. The system C(z) is 
characterized by the system function 


Determine the optimum coefficients of the adaptive transversal (FIR) filter 
B(z) = b„+ b,z that minimize the mean square error. The additive noise is 
white with variance ai, = 0.1. 

11-12 An NxN correlation matrix T has eigenvalues A,>A 2 >. . . >A V >0 and 
associated eigenvectors v,,v 2 , . . . ,v v . Such a matrix can be represented as 

N 

r = 2>,v,v,*' 

1 

a If T = r^r 1 ' 2 , where r ia is the square root of F, show that T 1 2 can be 
represented as 

T' a = 2 A, l,2 v,v,*' 

1 

b Using this representation, determine a procedure for computing T 1 ' 2 . 


x (n) - 


Co 


n(n) 
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FIGURE Pll-11 
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MULTICHANNEL AND 
MULTICARRIER SYSTEMS 


In some applications, it is desirable to transmit the same information-bearing 
signal over several channels. This mode of transmission is used primarily in 
situations where there is a high probability that one or more of the channels 
will be unreliable from time to time. For example, radio channels such as 
ionospheric scatter and tropospheric scatter suffer from signal fading due to 
multipath, which renders the channels unreliable for short periods of lime. As 
another example, multichannel signaling is sometimes employed in military 
communication systems as a means of overcoming the effects of jamming of the 
transmitted signal. By transmitting the same information over multiple 
channels, we are providing signal diversity, which the receiver can exploit to 
recover the information. 

Another form of multichannel communications is multiple carrier transmis- 
sion, where the frequency band of the channel is subdivided into a number of 
subchannels and information is transmitted on each of the subchannels. A 
rationale for subdividing the frequency band of a channel into a number of 
narrowband channels is given below. 

In this chapter, we consider both multichannel signal transmission and 
multicarrier transmission. We begin with a treatment of multichannel 
transmission. 

12-1 MULTICHANNEL DIGITAL COMMUNICATION 
IN AWGN CHANNELS 

In this section, we confine our attention to multichannel signaling over fixed 
channels that differ only in attenuation and phase shift. The specific model for 

680 
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the multichannel digital signaling system may be described as follows. The 
signal waveforms, in general are expressed as 

= Re I. O^t^T 

n = \ , 2 L, m = \ , 2, M (12-1-1) 

where L is the number of channels and M is the number of waveforms. The 
waveforms are assumed to have equal energy and to be equally probable a 
priori. The waveforms {sj/jfy)} transmitted over the L channels are scaled by 
the factors {a,,}, phase-shifted by {<£„}, and corrupted by additive noise. The 
equivalent Iowpass signals received from the L channels may be expressed as 

r) n) (() = a„e ~ Q^t^T 

« = 1.2 L, m = 1,2 M ( 12 - 1 - 2 ) 

where {i/",’(r)} are the equivalent Iowpass transmitted waveforms and {?„(')} 
represent the additive noise processes on the L channels. We assume that 
fo,(0} are mutually statistically independent and identically distributed gaus- 
sian noise random processes. 

We consider two types of processing at the receiver, namely, coherent 
detection and noncoherent detection. The receiver for coherent detection 
estimates the channel parameters {a,,} and {</>„} and uses the estimates in 
computing the decision variables. Suppose we define g„ = a„e and let g„ be 
the estimate of g„. The multichannel receiver correlates each of the L received 
signals with a replica of the corresponding transmitted signals, multiplies each 
of the correlator outputs by the corresponding estimates {£*}, and sums the 
resulting signals. Thus, the decision variables for coherent detection are the 
correlation metrics 

CM m = 2 Re r}' ,) (r)s};; i , *(r) drj, #n = 1.2 M (12-1-3) 

In noncoherent detection, no attempt is made to estimate the channel 
parameters. The demodulator may base its decision either on the sum of the 
envelopes (envelope detection) or the sum of the squared envelopes (square- 
law detection) of the matched filter outputs. In general, the performance 
obtained with envelope detection differs little from the performance oblained 
with square-law detection in AWGN. However, square-law detection of 
multichannel signaling in AWGN channels is considerably easier to analyze 
than envelope detection. Therefore, we confine our attention to square-law 
detection of the received signals of the L channels, which produces the 
decision variables 


cM m = j? If r <rxtwr m '*(t)dt 

n 1 I ■*» 


m = 1 , 2, . . . , M 


(12-1-4) 


Let us consider binary signaling first, and assume that s)\‘\ n = 1,2 L 



682 DIOn AL rOMMt NK'ATIONS 


are the L transmitted waveforms. Then an error is committed if CAT > CM,, 
or. equivalently, if the difference D = CM,-CM 2 <0. For noncoherent 
detection, this difference may be expressed as 

D = £ (|A;p - |y„p) (12-1-5) 

ii - 1 


where the variables {X,,} and {F„} are defined as 


x, = j 

O 

n = 1. 2,. . 

• , L 

(12-1-6) 


0 

n = 1, 2, . . 

. . L 



The {A",,} are mutually independent and identically distributed gaussian random 
variables. The same statement applies to the variables {y„}. However, for any 
n, X„ and Y„ may be correlated. For coherent detection, the difference 
D - CM, - CAT may be expressed as 


where, by definition. 


0 = (X„Y* + X*Y„) 


Y„ =g„, n - \ , 2 L 


(12-1-7) 


( 12 - 1 - 8 ) 


If the estimates {g,,} are obtained from observation of the received signal ovei' 
one or more signaling intervals, as described in Appendix C, their statistical 
characteristics are described by the gaussian distribution. Then the {F„} are 
characterized as mutually independent and identically distributed gaussian 
random variables. The same statement applies to the variables {X n }. As in 
noncoherent detection, we allow for correlation between X„ and Y„, but not 
between X„, and Y„ for m * n. 


12-1-1 Binary Signals 

In Appendix B, we derive the probability that the general quadratic form 

l. 

D=2{A IX, I 2 + B |F„| 2 + CX„ Y* + C*X*Y „ ) (12-1-9) 

n -- l 

in complex-valued gaussian random variables is less than zero. This prob- 
ability, which is given in (B-21) of Appendix B, is the probability of error for 
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binary multichannel signaling in AWGN. A number of special cases are of 
particular importance. 

If the binary signals are antipodal and the estimates Of {g„} are perfect, as in 
coherent PSK, the probability of error takes the simple form 

n = <?(V^) (12-1-10) 

where 

Sf ^ 2 

yt> ~~ ls«l 

” Ob-I 

= «, 2 , ( 12 - 1 - 11 ) 

™(> /r - 1 

is the SNR per bit. If the channels are all identical, a fl = a for all n and, hence, 

/ F 

7 * = — « 2 ( 12 - 1 - 12 ) 

K 

We observe that L% is the total transmitted signal energy for the L signals. The 
interpretation of this result is that the receiver combines the energy from the L 
channels in an optimum manner. That is, there is no loss in performance in 
dividing the total transmitted signal energy among the L channels. The same 
performance is obtained as in the case in which a single waveform having 
energy Ui is transmitted on one channel. This behavior holds true only if the 
estimates g n =g n , for all n. If the estimates are not perfect, a loss in 
performance occurs, the amount of which depends on the quality of the 
estimates, as described in Appendix C. 

Perfect estimates for {g„} constitute an extreme case. At the other extreme, 
we have binary DPSK signaling. In DPSK, the estimates {£„} are simply the 
(normalized psignal-plus-noise samples at the outputs of the matched filters in 
the previous signaling interval. This is the poorest estimate that one might 
consider using in estimating {g „ }. For binary DPSK, the probability of error 
obtained from (B-21) is 

j 1 

P ‘MFT f " V, Sa (12-1-13) 

where, by definition, 

1 L ^" /2L- 1\ 

and y h is the SNR per bit defined in (12-1-11) and, for identical channels in 
(12-1-12). This result can be compared with the single-channel (L = 1) error 
probability. To simplify the comparison, we assume that the L channels have 
identical attenuation factors. Thus, for the same value of y b , the performance 
of the multichannel system is poorer than that of the single-channel system. 
That is, splitting the total transmitted energy among L channels results in a loss 
in performance, the amount of which depends on L. 
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FIGURE 12-1-1 Combining loss in noncoherent detection and combination of binary multichannel signals. 


A loss in performance also occurs in square-law detection of orthogonal 
signals transmitted over L channels. For binary orthogonal signaling, the 
expression for the probability of error is identical in form to that for binary 
DPSK given in (12-1-13), except that y b is replaced by \y b . That is, 
binary orthogonal signaling with noncoherent detection is 3dB poorer than 
binary DPSK. However, the loss in performance due to noncoherent combina- 
tion of the signals received on the L channels is identical to that for binary 
DPSK. 

Figure 12-1-1 illustrates the loss resulting from noncoherent (square-law) 
combining of the L signals as a function of L. The probability of error is not 
shown, but it can be easily obtained from the curve of the expression 

P„ = (12-1-15) 

which is the error probability of binary DPSK shown in Fig. 5-2-12 and then 
degrading the required SNR per bit, y b , by the noncoherent combining loss 
corresponding to the value of L. 

12-1-2 M - ary Orthogonal Signals 

Now let us consider Af-ary orthogonal signaling with square-law detection and 
combination of the signals on the L channels. The decision variables are given 
by (12-1-4). Suppose that the signals n = 1, 2, . . . , L, are transmitted 

over the L AWGN channels. Then, the decision variables are expressed as 

E |22fa„ 4- N„i| 2 

At * 1 

<4= E |AU 2 , 




(12-1-16) 
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where the { N „„ ,} are complex-valued zero-mean gaussian random variables 
with variance cr : = ££(|W fira | J ) = 2 Hence U x is described statistically as a 
noncentral chi-square random variable with 2 L degrees of freedom and 
noncentrality parameter 


2 ( 2 £«„) 2 = 4£ 2 2 <*l 




L 

M=1 


(32-1-17) 


Using (2-1-118), we obtain the pdf of £/, as 


P{“ i) = 


1 /« 


4£/V u Vs 


(?)' 


(L- l)/2 


exp 


s 2 + 


4m 


u ' a0 <12 ' MS > 


On the other hand, the {U m }, m = 2, 3, .... M, are statistically independent 
and identically chi-square-distributed random variables, each having 2 L 
degrees of freedom. Using (2-1-110), we obtain the pdf for U„ as 




it m /4£N 0 




(4%N 0 ) l (L - 1)! 

m =2,3, . . . , M (12-1-19) 

The probability of a symbol error is 

P» = 1 - f c 

- 1 - £(f/ 2 < £/„ £/ 3 < £/„ . . . , f/„ < £/,) 

= l-[ [/ > (t/ 2 <ui | (7 1 = « 1 )] M ~ , p(«i)^i 
Jo 


(12-1-20) 


But 


P(C/ 2 <t/ I |f/ 1 =«,) = l-exp(--^-) VlfJilJ) 

1 V \ 4?Af„/ £ 0 kl\4*Nj 


Hence, 


( 12 - 1 - 21 ) 


n c -i i / .. a-iM-i 

p( - u ' )du ' 


n L-\ «-l, , (L-l)/2 

(y) *~ ( ’ + ^.(2^)du (12-1-22) 


r = ^ 2 «n/.V 0 


where 
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The integral in (12-1-22) can be evaluated numerically. It is also possible to 
expand the term (1 - x) M ~ l in (12-1-22) and carry out the integration term by 
term. This approach yields an expression for P M in terms of finite sums. 

An alternative approach is to use the union bound 

P m <(M~1)P 2 (L) (12-1-23) 

where P 2 (L ) is the probability of error in choosing between U, and any one of 
the M - 1 decision variables {U m }, m — 2, 3, . . . , Af. From our previous 
discussion on the performance of binary orthogonal signaling, we have 

P 2 (L) = ~e k ^ a 2 c „tiky b y (12-1-24) 

* n “0 

where c„ is given by (12-1-14). For relatively small values of Af, the union 
bound in (12-1-23) is sufficiently tight for most practical applications. 


12-2 MULTICARRIER COMMUNICATIONS 

From our treatment of nonideal linear filter channels in Chapters 10 and 11, we 
have observed that such channels introduce ISI, which degrades performance 
compared with the ideal channel. The degree of performance degradation 
depends on the frequency response characteristics. Furthermore, the com- 
plexity of the receiver increases as the span of the ISI increases. 

Given a particular channel characteristic, the communication system desig- 
ner must decide how to efficiently utilize the available channel bandwidth in 
order to transmit the information reliably within the transmitter power 
constraint and receiver complexity constraints. For a nonideal linear filter 
channel, one option is to employ a single carrier system in which the 
information sequence is transmitted serially at some specified rate R symbols/s. 
In such a channel, the time dispersion is generally much greater than the 
symbol rate and, hence, ISI results from the nonideal frequency response 
characteristics of the channel. As we have observed, an equalizer is necessary 
to compensate for the channel distortion. 

An alternative approach to the design of a bandwidth-efficient communica- 
tion system in the presence of channel distortion is to subdivide the available 
channel bandwidth into a number of subchannels, such that each subchannel is 
nearly ideal. To elaborate, suppose that C{f ) is the frequency response of a 
nonideal, band-limited channel with a bandwidth W, and that the power 
spectral density of the additive gaussian noise is <!>„„(/). Then, we divide the 
bandwidth W into N = W/Af subbands of width Af, where Af is chosen 
sufficiently small that |C(/)| 2 /<I>„„(/) is approximately a constant within each 
subband. Furthermore, we shall select the transmitted signal power to be 
distributed in frequency as P(/), subject to the constraint that 

f P(/)d/^P av 


( 12 - 2 - 1 ) 



CHAPTER 12: MULTICHANNEL AND MllLTlCARRJER SYSTEMS 687 


where, is the available average power of the transmitter. Let us evaluate 
the capacity of the nonideal additive gaussian noise channel. 


12-2-1 Capacity of a Nonideal Linear Filter Channel 

Recall that the capacity of an ideal, band-limited, AWGN channel is 


C — 



( 12 - 2 - 2 ) 


where C is the capacity in bits/s, W is the channel bandwidth, and P av is the 
average transmitted power. In a multicarrier system, with A/ sufficiently small, 
the subchannel has capacity 


C,= 


A/ log 2 1 + 


A/Pq)|C (/)| 2 

A J 


(12-2-3) 


Hence, the total capacity of the channel is 


C = ic, = A/ilog 2 [l + 

/'= 1 /=! 


nttwm 2 - 

*>M - 


(12-2-4) 


In the limit as A/ 0, we obtain the capacity of the overall channel in bits/s as 


C = 



nn\c{f) U 


\ df 


(12-2-5) 


Under the constraint on P(f) given by (12-2-1), the choice of P(f) that 
maximizes C may be determined by maximizing the integral 



P(f)\C(f )\ 2 

**,</) 



(12-2-6) 


where A is a Lagrange multiplier, which is chosen to satisfy the constraint. By 
using the calculus of variations to perform the maximization, we find that the 
optimum distribution of transmitted signal power is the solution to the 
equation 


1 

|C(/)| 2 P(/) + 4>„„(/) + A_0 


(12-2-7) 


Therefore, P(f) + ^„ n (f)/\C(f)\ 2 must be a constant, whose value is adjusted 
to satisfy the average power constraint in (12-2-1). That is, 


lo (/eW) 


( 12 - 2 - 8 ) 


This expression for the channel capacity of a nonideal linear filter channel with 
additive gaussian noise is due to Shannon (1949). The basic interpretation of 
this result is that the signal power should be high when the channel SNR 

|C(/)| 2 /<b 

/i/i (/) is high, and low when the channel SNR is low. This result on 
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FIGURE 12-2-1 


FIGURE 12-2-2 


The optimum water-pouring spectrum. 



the transmitted power distribution is illustrated in Fig. 12-2-1. Observe that if 
< I > nn(/)/|C(/)( 2 is interpreted as the bottom of a bowl of unit depth, and we 
pour an amount of water equal to P tv into the bowl, the water will distribute 
itself in the bowl so as to achieve capacity. This is called the water-filling 
interpretation of the optimum power distribution as a function of frequency. 

It is interesting to note that the channel capacity is the smallest when the 
channel SNR IC(f)l 2 /<P m (f ) is a constant for all / e W. In this case, P(f) is a 
constant for all / e W. Equivalently, if the channel frequency response is ideal, 
i.e., C(f ) = 1 for / e W, then the worst gaussian noise power distribution, from 
the viewpoint of maximizing capacity, is white gaussian noise. 

The above development suggests that multicarrier modulation that divides 
the available channel bandwidth into subbands of relatively narrow width 
A f — W /N provides a solution that could yield transmission rates close to 
capacity. The signal in each subband may be independently coded and 
modulated at a synchronous symbol rate of 1/A/, with the optimum power 
allocation P(f). If A f is small enough then C(f ) is essentially constant across 
each subband, so that no equalization is necessary because the ISI is negligible. 

Multicarrier modulation has been used in modems for both radio and 
telephone channels. Multicarrier modulation has also been proposed for future 
digital audio broadcast applications. 

A particularly suitable application of multicarrier modulation is in digital 
transmission over copper wire subscriber loops. The typical channel attenua- 
tion characteristics for such subscriber lines are illustrated in Fig. 12-2-2. We 


Attenuation characteristic of a 24 gauge 12 kft PIC loop. 
[From Werner (1991) ©IEEE.] 
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observe that the attenuation increases rapidly as a function of frequency. This 
characteristic makes it extremely difficult to achieve a high transmission rate 
with a single modulated carrier and an equalizer at the receiver. The ISI 
penalty in performance is very large. On the other hand, multicarrier 
modulation with optimum power distribution provides the potential for a 
higher transmission rate. 

The dominant noise in transmission over subscriber lines is crosstalk 
interference from signals carried on other telephone lines located in the same 
cable. The power distribution of this type of noise is also frequency- 
dependent, which can be taken into consideration in the allocation of the 
available transmitted power. 

A design procedure for a multicarrier QAM system for a nonideal linear 
filter channel has been given by Kalet (1989). In this procedure, the overall bit 
rate is maximized, through the design of an optimal power division among the 
subcarriers and an optimum selection of the number of bits per symbol (sizes 
of the QAM signal constellations) for each subcarrier, under an average power 
constraint and under the constraint that the symbol error probabilities for all 
subcarriers are equal. 

Below, we present an implementation of a multicarrier QAM modulator 
and demodulator that is based on the discrete Fourier transform (DFT) for the 
generation of the multiple carriers. 


12-2-2 An FFT- Based Multicarrier System 

In this section, we describe a multicarrier communication system that employs 
the fast Fourier transform (FFT) algorithm to synthesize the signal at the 
transmitter and to demodulate the received signal at the receiver. The FFT is 
simply the efficient computational tool for implementing the discrete fourier 
transform (DFT). 

Figure 12-2-3 illustrates a block diagram of a multicarrier communication 


FIGURE 12-2-3 Multicarrier communication system. 
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system. A serial-to-parallel buffer segments the information sequence into 
frames of N f bits. The N f bits in each frame are parsed into N groups, where 
the *th group is assigned n, bits, and 

N 

S «. = Nf (12-2-9) 

1 = 1 

Each group may be encoded separately, so that the number of output bits from 
the encoder for the <th group is rt, 5* n r 

It is convenient to view the multicarrier modulation as consisting of iV 
independent QAM channels, each operating at the same symbol rate 1 IT, but 
each channel having a distinct QAM constellation, i.e., the ith channel will 
employ Mi-2"' signal points. We denote the complex-valued signal points 
corresponding to the information symbols on the subchannels by X k , k = 
0, 1,. . . , Si — 1. In order to modulate the N subcarriers by the information 
symbols {A'*}, we employ the inverse DFT (IDFT). 

However, if we compute the A-point IDFT of {AT*}, we shall obtain a 
complex-valued time series, which is not equivalent to N QAM-modulated 
subcarriers. Instead, we create N = 2N information symbols by defining 

Xs-k = X%, k = \, (12-2-10) 

and Xq = Re (Af 0 ), Xs = Im (Af 0 )- Thus, the symbol Af 0 is split into two parts, 
both real. Then, the N-point IDFT yields the real-valued sequence 

x * = ^ jS X k e il * nk, " > n = 0, 1 N - 1 (12-2-1 1) 

where 1/V/V is simply a scale factor. 

■Hie sequence {x„, 1} corresponds to the samples of the sum x(t) 

of S/ subcarrier signals, which is expressed as 

*(0 = ^ X k e i2xk,,T , 0 ^ T (12-2-12) 

where T is the symbol duration. We observe that the subcarrier frequencies are 
fk = k/T, k = 0,1,..., jV. Furthermore, the discrete-time sequence {*„} in 
(12-2-10) represents the samples of x(t) taken at times t~nTIN where 
n = 0, 1, . . . , N - 1. 

The computation of the IDFT of the data {X k } as given in (12-2-10) may be 
viewed as multiplication of each data point X k by a corresponding vector 

= Ufci ... v* (A /-i)] (12-2-13) 

where 

v kn = ~~^ 2alN ' ,kn (12-2-14) 



FIGURE 12-2-4 
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Signal synthesis for multicarrier modulation 
based on inverse DFT. 



as illustrated in Fig. 12-2-4. In any case, the computation of the DFT is 
performed efficiently by the use of the FFT algorithm. 

In practice, the signal samples {*„} are passed through a D/A converter 
whose output, ideally, would be the signal waveform x(t). The output of the 
channel is the waveform 


r(t) = x(t) ★ h(t) + n(t) (12-2-15) 

where h(t) is the impulse response of the channel and ★ denotes convolution. 
By selecting the bandwidth A / of each subchannel to be very small, the symbol 
duration T — 1/A/ is large compared with the channel time dispersion. To be 
specific, let us assume that the channel dispersion spans v + 1 signal samples 
where v « N. One way to avoid the effect of IS1 is to insert a time guard band 
of duration vT / N between transmissions of successive blocks. 

An alternative method that avoids ISI is to append a cyclic prefix to each 
block of N signal samples {x 0 , x u . . . , The cyclic prefix for this block of 

samples consists of the samples x N ^ v ,x N - v+u . . . ,x N -i- These new samples 
are appended to the beginning of each block. Note that the addition of the 
cyclic prefix to the block of data increases the length of the block to N + v 
samples, which may be indexed from n - —v, . . . , N - 1, where the first v 
samples constitute the prefix. Then, if {/t„, 0 *£ n =£ v} denotes the sampled 
channel impulse response, its convolution with {jc„, -v =s n *£ N - 1} produces 
{/•„}, the received sequence. We are interested in the samples of {r„} for 
0^n«/V - 1 , from which we recover the transmitted sequence by using the 
N -point DFT for demodulation. Thus, the first v samples of {r,,} are discarded. 

From a frequency-domain viewpoint, when the channel impulse response is 
{h n , v}, its frequency response at the subcarrier frequencies f k = k/N is 

H k * h(^t) - (12-2-16) 


Due to the cyclic prefix, successive blocks (frames) of the transmitted 
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information sequence do not interfere and, hence, the demodulated sequence 
may be expressed as 

k k = H k X k + Vk , k = 0, 1, . . . , N - 1 (12-2-17) 

where {k k } is the output of the jV-point DFT demodulator, and 17 * is the 
additive noise corrupting the signal. We note that by selecting N » v, the rate 
loss due to the cyclic prefix can be rendered negligible. 

As shown in Fig. 12-2-3, the information is demodulated by computing the 
DFT of the received signal after it has been passed through an A/D converter. 
The DFT computation may be viewed as a multiplication of the received signal 
samples {r„} from the A/D converter by v*, where v„ is defined in (12-2-12). As 
in the case of the modulator, the DFT computation at the demodulator is 
performed efficiently by use of the FFT algorithm. 

It is a simple matter to estimate and compensate for the channel factors {H k } 
prior to passing the data to the detector and decoder. A training signal 
consisting of either a known modulated sequence on each of the subcarriers or 
unmodulated subcarriers may be used to measure the {//*} at the receiver. If 
the channel parameters vary slowly with time, it is also possible to track the 
time variations by using the decisions at the output of the detector or the 
decoder, in a decision-directed fashion. Thus, the multicarrier system can be 
rendered adaptive. 

Multicarrier QAM modulation of the type described above has been 
implemented for a variety of applications, including high-speed transmission 
over telephone lines, such as digital subscriber lines. 

Other types of implementation besides the DFT are possible. For example, 
a digital filter bank that basically performs the DFT may be substituted for the 
FFT-based implementation when the number of suborners is small, e.g., 
A r «32. For a large number of subcarriers, e.g., N>32, the FFT-based systems 
are computatively more efficient. 

One limitation of the DFT- type modulators and demodulators arises from 
the relatively large sidelobes in frequency that are inherent in DFT-type filter 
banks. The first sidelobe is only 13 dB down from the peak at the desired 
subcarrier. Consequently, the DFT-based implementations are vulnerable to 
interchannel interference (ICI) unless a full cyclic prefix is used. If ICI is a 
problem, due to channel anomalies, one may resort to other types of digital 
filter banks that have much lower sidelobes. In particular, the class of multirate 
digital filter banks that have the perfect reconstruction property associated 
with wavelet-based filters appear to be an attractive alternative (see Tzannes el 
al., 1994; Rizos et ai, 1994). 

12-3 BIBLIOGRAPHICAL NOTES AND REFERENCES 

Multichannel signal transmission is commonly used on time-varying channels 
to overcome the effects of signal fading. This topic is treated in some detail in 
Chapter 14, where we provide a number of references to published work. Of 
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particular relevance to the treatment of multichannel digital communications 
given in this chapter are the two publications by Price (1962a,b). 

There is a large amount of literature on multicarrier digital communication 
systems. Such systems have been implemented and used for over 30 years. One 
of the earliest systems, described by Doeltz et al. (1957) and called Kineplex, 
was used for digital transmission in the HF band. Other early work on 
multicarrier system design has .been reported in the papers by Chang (1966) 
and Saltzburg (1967). The use of the DFT for modulation and demodulation of 
multicarrier systems was proposed by Weinstein and Ebert (1971). 

Of particular interest in recent years is the use of multicarrier digital 
transmission for data, facsimile, and video on a variety of channels, including 
the narrowband (4 kHz) switched telephone network, the 48 kHz group 
telephone band, digital subscriber lines, cellular radio, and audio broadcast. 
The interested reader may refer to the many papers in the literature. We cite 
as examples the papers by Hirosaki et al. (1981, 1986), Chow et al. (1991), and 
the survey paper by Bingham (1990). The paper by Kalet (1989) gives a design 
procedure for optimizing the rate in a multicarrier QAM system given 
constraints on transmitter power and channel characteristics. Finally, we cite 
the book by Vaidyanathan (1993) and the papers by Tzannes et al. (1994) and 
Rizos et al. (1994) for a treatment of multirate digital filter banks. 


PROBLEMS 


12-1 X u X 2 , . . . ,X N are a set of N statistically independent and identically distributed 
real gaussian random variables with moments E{X t ) = m and var (A)) = a 2 . 

■ Define 

u = 2x„ 

Evaluate the SNR of U, which is defined as 


(SNR)^ = 


[E(l f )f 

2 <Ju 


where <r\, is the variance of U. 
b Define 

V = £ X 2 „ 

/f~ 1 

Evaluate the SNR of V, which is defined as 


(SNR)„ = 


[£(V)f 

2<r 2 y 


where cr 2 v is the variance of V. 

t Plot (SNR)(/ and (SNR)„ versus m 2 /a 2 on the same graph and, thus, compare 
the SNRs graphically. 
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d What does the result in (c) imply regarding coherent detection and combining 
versus square-law detection and combining of multichannel signals? 

12-2 A binary communication system transmits the same information on two diversity 
channels. The two received signals are 

r, = ±VW h + n, 
r 2 ~ ±V%, + 

where £(n,) = E(n 2 ) = 0, E{n])~<r] and E(n\) = a\, and n { and n 2 are uncorre- 
lated gaussian, variables. The detector bases its decision on the linear combination 
of r f and r z , i.e., 

r = r, + kr 2 

a Determine the value of k that minimizes (he probability of error, 
b Plot the probability of error for a\ = 1, a\ =3, and either k = 1 or k is the 
optimum value found in (a). Compare the results. 

12-3 Assess the cost of the cyclic prefix (used in multitone modulation to avoid IS1) in 
terms of 

a extra channel bandwidth; 
b extra signal energy. 

12-4 Let x(n) be a finite-duration signal with length N and let X(k) be its /V-point DFT. 
Suppose we pad x(n) with L zeros and compute the (N + L)-point DFT, X'(k). 
What is the relationship between A'(O) and *'(0)? If we plot |AT(fc)| and JJT '(Ar)j on 
the same graph, explain the relationships between the two graphs. 

12-5 Show that the sequence {*„} given by (12-2-11) corresponds to the samples of the 
signal x(t) given by (12-2-12). 

12-6 Show that the IDFT of a sequence {X t , 0 k ^ N - 1} can be computed by passing 
the sequence {X,,} through a bank of iV linear discrete-time filters with system- 
functions 

) - j ~ e /2rtntN^ - I 

12-7 Plot P 2 (L) for L — ] and L = 2 as a function of 10 log y h and determine the loss in 
SNR due to the combining loss for = 10 
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SPREAD SPECTRUM 
SIGNALS FOR DIGITAL 
COMMUNICATIONS 


Spread spectrum signals used for the transmission of digital information are 
distinguished by the characteristic that their bandwidth W is much greater than 
the information rate R in bits/s. That is, the bandwidth expansion factor 
B e = W/R for a spread spectrum signal is much greater than unity. The large 
redundancy inherent in spread spectrum signals is required to overcome the 
severe levels of interference that are encountered in the transmission of digital 
information over some radio and satellite channels. Since coded waveforms are 
also characterized by a bandwidth expansion factor greater than unity and 
since coding is an efficient method for introducing redundancy, it follows that 
coding is an important element in the design of spread spectrum signals. 

A second important element employed in the design of spread spectrum 
signals is pseudo-randomness, which makes the signals appear similar to 
random noise and difficult to demodulate by receivers other than the intended 
ones. This element is intimately related with the application or purpose of such 
signals. 

To be specific, spread spectrum signals are used for 

• combatting or suppressing the detrimental effects of interference due to 
jamming, interference arising from other users of the channel, and self- 
interference due to multipath propagation; 

• hiding a signal by transmitting it at low power and, thus, making it 
difficult for an unintended listener to detect in the presence of background 
noise; 

• achieving message privacy in the presence of other listeners. 

In applications other than communications, spread spectrum signals are used 
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to obtain accurate range (time delay) and range rate (velocity) measurements 
in radar and navigation. For the sake of brevity, we shall limit our discussion to 
digital communications applications. 

In combatting intentional interference (jamming), it is important to the 
communicators that the jammer who is trying to disrupt the communication 
does not have prior knowledge of the signal characteristics except for the 
overall channel bandwidth and the type of modulation, (PSK, FSK, etc.) being 
used. If the digital information is just encoded as described in Chapter 8, a 
sophisticated jammer can easily mimic the signal emitted by the transmitter 
and, thus, confuse the receiver. To circumvent this possibility, the transmitter 
introduces an element of unpredictability or randomness (pseudo-randomness) 
in each of the transmitted coded signal waveforms that is known to the 
intended receiver but not to the jammer. As a consequence, the jammer must 
synthesize and transmit an interfering signal without knowledge of the 
pseudo-random pattern. 

Interference from the other users arises in multiple-access communication 
systems in which a number of users share a common channel bandwidth. At 
any given time, a subset of these users may transmit information simul- 
taneously over the common channel to corresponding receivers. Assuming that 
all the users employ the same code for the encoding and decoding of their 
respective information sequences, the transmitted signals in this common 
spectrum may be distinguished from one another by superimposing a different 
pseudo-random pattern, also called a code, in each transmitted signal. Thus, a 
particular receiver can recover the transmitted information intended for it by 
knowing the pseudo-random pattern, i.e., the key, used by the corresponding 
transmitter. This type of communication technique, which allows multiple users 
to simultaneously use a common channel for transmission of information, is 
called code division multiple access (CDMA). CDMA will be considered in 
Sections 13-2 and 13-3. 

Resolvable multipath components resulting from time-dispersive propaga- 
tion through a channel may be viewed as a form of self-interference. This type 
of interference may also be suppressed by the introduction of a pseudo-random 
pattern in the transmitted signal, as will be described below. 

A message may be hidden in the background noise by spreading its 
bandwidth with coding and transmitting the resultant signal at a low average 
power. Because of its low power level, the transmitted signal is said to be 
‘covert.” It has a low probability of being intercepted (detected) by a casual 
listener and, hence, is also called a low -probability -of -intercept (LPI) signal. 

Finally, message privacy may be obtained by superimposing a pseudo- 
random pattern on a transmitted message. The message can be demodulated 
by the intended receivers, who know the pseudo-random pattern or key used 
at the transmitter, but not by any other receivers who do not have knowledge 
of the key. 

In the following sections, we shall describe a number of different types of 
spread spectrum signals, their characteristics, and their application. The 
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FIGURE 13-1-1 Model of spread spectrum digital communication system. 


emphasis will be on the use of spread spectrum signals for combatting, 
jamming (antijam or AJ signals), for CDMA, and for LPI. Before discussing 
the signal design problem, however, we shall briefly describe the types of 
channel characteristics assumed for the applications cited above. 

13-1 MODEL OF SPREAD SPECTRUM DIGITAL 
COMMUNICATION SYSTEM 

The block diagram shown in Fig. 13-1-1 illustrates the basic elements of a 
spread spectrum digital communication system with a binary information 
sequence at its input at the transmitting end and at its output at the receiving 
end. The channel encoder and decoder and the modulator and demodulator 
are basic elements of the system, which were treated in Chapters 5, 7 and 8. In 
addition to these elements, we have two identical pseudo-random pattern 
generators, one that interfaces with the modulator at the transmitting end and 
a second that interfaces with the demodulator at the receiving end. The 
generators generate a pseudo-random or pseudo-noise (PN) binary-valued 
sequence, which is impressed on the transmitted signal at the modulator and 
removed from the received signal at the demodulator. 

Synchronization of the PN sequence generated at the receiver with the PN 
sequence contained in the incoming received signal is required in order to 
demodulate the received signal. Initially, prior to the transmission of informa- 
tion, synchronization may be achieved by transmitting a fixed pseudo-random 
bit pattern that the receiver will recognize in the presence of interference with 
a high probability. After time synchronization of the generators is established, 
the transmission of information may commence. 

Interference is introduced in the transmission of the information-bearing 
signal through the channel. The characteristics of the interference depend to a 
large extent on its origin. It may be categorized as being either broadband or 
narrowband relative to the bandwidth of the information-bearing signal, and 
either continuous or pulsed (discontinuous) in time. For example, a jamming 
signal may consist of one or more sinusoids in the bandwidth used to transmit 
the information. The frequencies of the sinusoids may remain fixed or they 
may change with time according to some rule. As a second example, the 
interference generated in CDMA by other users of the channel may be either 
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broadband or narrowband, depending on the type of spread spectrum signal 
that is employed to achieve multiple access. If it is broadband, it may be 
characterized as an equivalent additive white gaussian noise. We shall consider 
these types of interference and some others in the following sections. 

Our treatment of spread spectrum signals will focus on the performance of 
the digital communication system in the presence of narrowband and broad- 
band interference. Two types of modulation are considered: PSK and FSK. 
PSK is appropriate in applications where phase coherence between the 
transmitted signal and the received signal can be maintained over a time 
interval that is relatively long compared to the reciprocal of the transmitted 
signal bandwidth. On the other hand, FSK modulation is appropriate in 
applications where such phase coherence cannot be maintained due to 
time-variant effects on the communications link. This may be the case in a 
communications link between two high-speed aircraft or between a high-speed 
aircraft and a ground terminal. 

The PN sequence generated at the modulator is used in conjunction with the 
PSK modulation to shift the phase of the PSK signal pseudo-randomly as 
described in Section 13-2. The resulting modulated signal is called a direct 
sequence (DS) or a pseudo-noise (PN) spread spectrum signal. When used in 
conjunction with binary or A/-ary {M > 2) FSK, the pseudo-random sequence 
selects the frequency of the transmitted signal pseudo-randomly. The resulting 
signal is called a frequency-hopped (FH) spread spectrum signal. Although a 
number of other types of spread spectrum signals will be briefly described, the 
emphasis of our treatment will be on PN and FH spread spectrum signals. 

13-2 DIRECT SEQUENCE SPREAD SPECTRUM 
SIGNALS 

In the model shown in Fig. 13-1-1, we assume that the information rate at the 
input to the encoder is R bits/s and the available channel bandwidth is W Hz. 
The modulation is assumed to be binary PSK. In order to utilize the entire 
available channel bandwidth, the phase of the carrier is shifted pseudo- 
randomly according to the pattern from the PN generator at a rate W times/s. 
The reciprocal of W, denoted by T c , defines the duration of a rectangular 
pulse, which is called a chip while T c is called the chip interval. The pulse is the 
basic element in a DS spread spectrum signal. 

If we define T b — 1 IR to be the duration of a rectangular pulse correspond- 
ing to the transmission time of an information bit, the bandwidth expansion 
factor W /R may be expressed as 

W T b 

In practical systems, the ratio T b /T c is an integer, 

T b 
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FIGURE 13-2-1 



Data signal 




(i) DS-QPSK modulator 

The PN and data signals (a) and the QPSK modulator ( b ) for a DS spread spectrum system. 


which is the number of chips per information bit. That is, L c is the number of 
phase shifts that occur in the transmitted signal during the bit duration 
T b = l/R. Figure 13-2-l(a) illustrates the relationships between the PN signal 
and the data signal. 

Suppose that the encoder takes k information bits at a time and generates a 
binary linear ( n , k) block code. The time duration available for transmitting 
the n code elements is kT b s. The number of chips that occur in this time 
interval is kL c . Hence, we may select the block length of the code as n- kL c . 
If the encoder generates a binary convolutional code of rate kin, the number 
of chips in the time interval kT b is also n - kL c . Therefore, the following 
discussion applies to both block codes and convolutional codes. 

One method for impressing the PN sequence on the transmitted signal is to 
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alter directly the coded bits by modulo-2 addition with the PN sequence.t 
T7i us, each coded bit is altered by its addition with a bit from the PN sequence. 
If bj represents the ith bit of the PN sequence and c, is the corresponding bit 
from the encoder, the modulo-2 sum is 


Oj — bi^c, ( 13 - 2 - 3 ) 

Hence, a, = 1 if either h, = 1 and c, = 0 or b, = 0 and c, = 1; also, a, — 0 if either 
bj = 1 and c, = 1 or b, = 0 and C; = 0. We may say that a, = 0 when bj = c, and 
a k = 1 when bj #c,. The sequence {a,} is mapped into a binary PSK signal of the 
form s(f) = ±Re [g{t)e i2nf ' t } according to the convention 


&(*) = 


g(f - iT c ) 

~ g(t~iT c ) 


(a, = 0) 

(«,= 1 ) 


(13-2-4) 


where g(r) represents a pulse of duration T c s and arbitrary shape. 

The modulo-2 addition of the coded sequence {c,} and the sequence {£>,} 
from the PN generator may also be represented as a multiplication of two 
waveforms. To demonstrate this point, suppose that the elements of the coded 
sequence are mapped into a binary PSK signal according to the relation 

c i(t) = (2c, - l)g(r - iT c ) (13-2-5) 

Similarly, we define a waveform p,(t) as 

PfiO — (2bj - l)p(t - iT c ) (13-2-6) 

where p(t) is a rectangular pulse of duration T c . Then the equivalent lowpass 
transmitted signal corresponding to the ith coded bit is 


g,(t)=pXt)Cj(t) 

= (2 b, - 1)(2 c, - l)g(r - iT c ) (13-2-7) 

This signal is identical to the one given by (13-2-4), which is obtained from the 
sequence {a,}. Consequently, modulo-2 addition of the coded bits with the PN 
sequence followed by a mapping that yields a binary PSK signal is equivalent 
to multiplying a binary PSK signal generated from the coded bits with a 
sequence of unit amplitude rectangular pulses, each of duration T c , and with a 
polarity which is determined from the PN sequence according to (13-2-6). 
Although it is easier to implement modulo-2 addition followed by PSK 
modulation instead of waveform multiplication, it is convenient, for purposes 
of demodulation, to consider the transmitted signal in the multiplicative form 

t When four-phase PSK is desired, one PN sequence is added to the information sequence carried 
on the in-phase signal component and a second PN sequence is added to the information sequence 
carried on the quadrature component. In many PN-spread spectrum systems, the same binary 
information sequence is added to the two PN sequences to form the two quadrature components. 
Thus, a four-phase PSK signal is generated with a binary information stream. 
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given by (13-2-7). A functional block diagram of a four-phase PSK. DS spread 
spectrum modulator is shown in Fig. 13-2-1(6). 

The received equivalent lowpass signal for the ith code element ist 

r,(0 = pMcAt) + z(t), iT c * t (t + 1)T C 

= (26, - l)(2c, - 1 )g(f - iT c ) + z(t) (13-2-8) 

where z(r) represents the interference or jamming signal corrupting the 
information-bearing signal. The interference is assumed to be a stationary 
random process with zero mean. 

If z(r) is a sample function from a complex-valued gaussian process, the 
optimum demodulator may be implemented either as a filter matched to the 
waveform g(t) or as a correlator, as illustrated by the block diagrams in Fig. 
13-2-2. In the matched filter realization, the sampled output from the matched 
filter is multiplied by 26, - 1, which is obtained from the PN generator at the 


FIGURE 13-2-2 


Possible demodulator structures for PN spread spectrum signals. 



< a ) 



(b) 



<c) 


t For simplicity, we assume that the channel attenuation a = 1 and the phase shift of the 
channel is zero. Since coherent PSK detection is assumed, any arbitrary channel phase shift is 
compensated for in the demodulation. 
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demodulator when the PN generator is properly synchronized. Since (2b, - 
l) 2 = 1 when b, = 0 and b , = 1, the effect of the PN sequence on the received 
coded bits is thus removed. 

In Fig. 13-2-2, we also observe that the cross-correlation can be accompl- 
ished in either one of two ways. The first, illustrated in Fig. 13-2-2(f>), involves 
premultiplying r,(t) with the waveform p^t) generated from the output of the 
PN generator and then cross-correlating with g*(t) and sampling the output in 
each chip interval. The second method, illustrated in Fig. 13-2-2(c), involves 
cross-correlation with g*(t) first, sampling the output of the correlator and, 
then, multiplying this output with 2b, -1, which is obtained from the PN 
generator. 

If z(r) is not a gaussian random process, the demodulation methods 
illustrated in Fig. 13-2-2 are no longer optimum. Nevertheless, we may still use 
any of these three demodulator structures to demodulate the received signal. 
When the statistical characteristics of the interference z(t) are unknown a 
priori, this is certainly one possible approach. An alternative method, which is 
described later, utilizes an adaptive filter prior to the matched filter or 
correlator to suppress narrowband interference. The rationale for this second 
method is also described later. 

In Section 13-2-1, we derive the error rate performance of the DS spread 
spectrum system in the presence of wideband and narrowband interference. 
The derivations are based on the assumption that the demodulator is any of 
the three equivalent structures shown in Fig. 13-2-2. 

13-2-1 Error Rate Performance of the Decoder 

Let the unquantized output of the demodulator be denoted by y„ 1 ss ; ss n. 
First we consider a linear binary ( n,k ) block code and, without loss of 
generality, we assume that the all-zero code word is transmitted. 

A decoder that employs soft-decision decoding computes the correlation 
metrics 

n 

CM, = 2 (2c, , - l)y., t = 1,2 2* (13-2-9) 

j - 1 

where denotes the jth bit in the i th code word. The correlation metric 
corresponding to the all-zero code word is 

n 

CM \ = 2n% c + £ (2c v - 1)(2 bj - l)v, 

y=i 

n 

= 2n% c - X (2b j - l)v; (13-2-10) 

/= > 

where v y , 1 «;««, is the additive noise term corrupting the jth coded bit and 
% is the chip energy. It is defined as 

V ' = Re {/ 0 8*mt + U-W]dt}, j = 1, 2, . . . ,n (13-2-11) 
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Similarly, the correlation metric corresponding to code word C m having 
weight w m is 

( 2w \ " 

1 - — + 2 {2c mJ - 1)(2 bj - l)v y (13-2-12) 

n / y = i 

Following the procedure used in Section 8-1-4, we shall determine the 
probability that CM m >CM,. The difference between CM, and CM m is 


D = CM, - CM m 

= 4%.w m - 2 £ c my (26, - l)v y (13-2-13) 

/=i 

Since the code word C m has weight there are w m nonzero components in 
the summation of noise terms contained in (13-2-13). We shall assume that the 
minimum distance of the code is sufficiently large that we can invoke the 
central limit theorem for the summation of noise components. This assumption 
is valid for PN spread spectrum signals that have a bandwidth expansion of 20 
or more.t Thus, the summation of noise components is modeled as a gaussian 
random variable. Since E(2b j - 1)^0 and £(v y ) = 0, the mean of the second 
term in (13-2-13) is also zero. 

The variance is 

n n 

<r 2 m = 4 2 2 c mi c mj E[(2bj - 1)(2 b, - l)]£(v,v ; ) (13-2-14) 

/=i <=i 

The sequence of binary digits from the PN generator are assumed to be 
uncorrelated. Hence, 

E[{2b, - \){2b, - 1)] = S,, (13-2-15) 

and 

al = 4w m £(v 2 ) (13-2-16) 

where £(v 2 ) is the second moment of any one element from the set {v y }. This 
moment is easily evaluated to yield 

^( v2 )=[ [ g*{t)g(*)4> ll {t-T)dtdT 

•>0 ^0 

= f \G(f)\ 2 <t> zz (f)df (13-2-17) 

J — -x 


t Typically, the bandwidth expansion factor in a spread spectrum signal is of the order of 100 
and higher. 
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where <j>. z ( r) = ££jz*(/)z(f + r)] is the autocorrelation function and <!>„(/) is 
the power spectral density of the interference z(t). 

We observe that when the interference is spectrally flat within the 

bandwidtht occupied by the transmitted signal, i.e., 

<M/Wo I/I « iW (13-2-18) 

the second moment in (13-2-17) is E(v 2 ) = 2% C J 0 , and, hence, the variance of 
the interference term in (13-2-16) becomes 

o*, = 8%J 0 w m (13-2-19) 

In this case, the probability that D < 0 is 

Pi(rn) = (13-2-20) 

But the energy per coded bit % may be expressed in terms of the energy per 
information bit as 

% = - S* = K%, (13-2-21) 

n 

With this substitution, (13-2-20) becomes 

= Q(^2y h R t .w ai ) (13-2-22) 

where y b = £ 6 // 0 is the SNR per information bit. Finally, the code word error 
probability may be upper-bounded by the union bound as 

M 

P m « 2 Q(V2y b R c w m ) (13-2-23) 

m-2 

where M — 2*. Note that this expression is identical to the probability of a code 
word error for soft-decision decoding of a linear binary block code in an 
AWGN channel. 

Although we have considered a binary block code in the derivation given 
above, the procedure is similar for an (n, k) convolutional code. The result of 
such a derivation is the following upper bound on the equivalent bit error 
probability: 

Pt^l S 0 rf <2(V2 y b R t d) (13-2-24) 

K d=d„« 

The set of coefficients {fi d } is obtained from an expansion of the derivative of 
the transfer function T(D, N), as described in Section 8-2-3. 

Next, we consider a narrowband interference centered at the carrier (at d.c. 


t If the bandwidth of the bandpass channel is W, that of the equivalent low-pass channel is 
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for the equivalent lowpass signal). We may fix the total (average) jamming 
power to J av = J Q W, where J 0 is the value of the power spectral density of an 
equivalent wideband interference (jamming signal). The narrowband inter- 
ference is characterized by the power spectral density 


<M/) 


hsJoW 

■ w, w, 
0 


(l/l*£Wi) 

(l/l>iw.) 


(13-2-25) 


where W » W, . 

Substitution of (13-2-25) for into (13-2-17) yields 

= \G(f)\ 2 df (13-2-26) 

The value of E(v 2 ) depends on the spectral characteristics of the pulse g(t). In 
the following example, we consider two special cases. 


Example 13-2*1 

Suppose that g(t) is a rectangular pulse as shown in Fig. 13-2-3(a) and 
|G(/)| 2 is the corresponding energy density spectrum shown in Fig. 
13-2-3(6). For the narrowband interference given by (13-2-26), the variance 
of the total interference is 


<r 


2 

m 


4 w m E{v 2 ) 


8£>„. r t .y av r w ' a / sin itfT c \ 2 
xfT c ) 


8^ c Tv„,7 av 

W, 



(13-2-27) 


FIGURE 13-2-3 


Rectangular pulse and its energy density spectrum. 


gd) 



IG|/)P 



(6) 
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FIGURE 13-2-4 Plot of the value of the integral in (13-2-27). 



where P = W ) T C . Figure 13-2-4 illustrates the value of this integral for 
/3 =£ 1. We observe that the value of the integral is upper-bounded by 
W, T c . Hence, <r 2 „ s TJ av . 

In the limit as W t becomes zero, the interference becomes an impulse at 
the carrier. In this case the interference is a pure frequency tone and it is 
usually called a CW jamming signal. The power spectral density is 

*«</)= /.vSCO (13-2-28) 


and the corresponding variance for the decision variable D = CA/j - CM m is 

= 4 w m J av |G(0)| 2 

= 8 (13-2-29) 


The probability of a code word error for CW jamming is upper-bounded as 


1 M ' 



(13-2-30) 


But % = R c % b . Furthermore, T c = 1/W and J 3 JW = J a . Therefore (13-2-30) 
may be expressed as 



(13-2-31) 


which is the result obtained previously for broadband interference. This 
result indicates that a CW jammer has the same effect on performance as an 
equivalent broadband jammer. This equivalence is discussed further below. 


Example 13-2-2 


Let us determine the performance of the DS spread spectrum system in the 
presence of a CW jammer of average power / av when the transmitted signal 
pulse g(t) is one-half cycle of a sinusoid as illustrated in Fig. 13-2-5, i.e., 



nt 

V 


g{ 0 = 


0^t^T c 


(13-2-32) 
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FIGURE 13-2-5 


A sinusoidal signal pulse. 





The variance of the interference of this pulse is 

*1 = 4w m y av (G(0)| 2 
64 

= (13-2-33) 

Hence, the upper bound on the code word error probability is 

Pm * £ QUf^Z) (13-2-34) 

m~2 ' V 4J av T c ' 

We observe that the performance obtained with this pulse is 0.9 dB better 
than that obtained with a rectangular pulse. Recall that this pulse shape 
when used in offset QPSK results in an MSK signal. MSK modulation is 
frequently used in DS spread spectrum systems. 


The Processing Gain and the Jamming Margin An interesting interpreta- 
tion of the performance characteristics for the DS spread spectrum signal is 
obtained by expressing the signal energy per bit % in terms of the average 
power. That is, — P ay T b , where P av is the average signal power and T b is the 
bit interval. Let us consider the performance obtained in the presence of CW 
jamming for the rectangular pulse treated in Example 13-2-1. When we 
substitute for % b and J 0 into (13-2-31), we obtain 


M 


M 




m~2 



(13-2-35) 


where L c is the number of chips per information bit and P av // a v is the 
signal-to-jamming power ratio. 

An identical result is obtained with broadband jamming for which the 
performance is given by (13-2-23). For the signal energy per bit, we have 


— E av 7], 


R 


(13-2-36) 
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where R is the information rate in bits/s. The power spectral density for the 
jamming signal may be expressed as 


J o~ w 


(13-2-37) 


Using the relation in (13-2-36) and (13-2-37), the ratio % b /J 0 may be 
expressed as 

PJR W/R 

J 0 ~ JJW~ JJP ay (13-2-38) 


The ratio J av /P av is the jamming-to-signal power ratio, which is usually 
greater than unity. The ratio W/R = T b /T c = B r - L c is just the bandwidth 
expansion factor, or, equivalently, the number of chips per information bit. 
This ratio is usually called the processing gain of the DS spread spectrum 
system. It represents the advantage gained over the jammer that is obtained by 
expanding the bandwidth of the transmitted signal. If we interpret % b /J 0 as the 
SNR required to achieve a specified error rate performance and W/R as the 
available bandwidth expansion factor, the ratio J a JP iV is called the jamming 
margin of the DS spread spectrum system. In other words, the jamming margin 
is the largest value that the ratio J MV /P av can take and still satisfy the specified 
error probability. 

The performance of a soft-decision decoder for a linear (n, k) binary code, 
expressed in terms of the processing gain and the jamming margin, is 

M l I2W/R \ 

(13-2-39) 

In addition to the processing gain W/R and 7 av /f av , we observe that the 
performance depends on a third factor, namely, R c w m . This factor is the coding 
gain. A lower bound on this factor is R c d min . Thus the jamming margin 
achieved by the DS spread spectrum signal depends on the processing gain and 
the coding gain. 



Uncoded DS Spread Spectrum Signals The performance results given 
above for DS spread spectrum signals generated by means of an (n, k) code 
may be specialized to a trivial type of code, namely, a binary repetition code. 
For this case, k - 1 and the weight of the nonzero code word is w = n. Thus, 
R c w - 1 and, hence, the performance of the binary signaling system reduces to 
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FIGURE 13 - 2-6 


Note that the trivial (repetition) code gives no coding gain. It does result in 
a processing gain of W/R. 


Example 13-2-3 

Suppose that we wish to achieve an error rate performance of 10 -6 or less 
with an uncoded DS spread spectrum system. The available bandwidth 
expansion factor is W IR — 1000. Let us determine the jamming margin. 

The %U a required to achieve a bit error probability of 10 -6 with 
uncoded binary PSK is 10.5 dB. The processing gain is 10 log !0 1000 = 30 dB. 
Hpnce the maximum jamming-to-signal power that can be tolerated, i.e., the 
jamming margin, is 

10 log 10 ~ = 30 - 10.5 = 19.5 dB 

* av 

Since this is the jamming margin achieved with an uncoded DS spread 
spectrum system, it may be increased by coding the information sequence. 


There is another way to view the modulation and demodulation processes 
for the uncoded (repetition code) DS spread spectrum system. At the 
modulator, the signal waveform generated by the repetition code with 
rectangular pulses, for example, is identical to a unit amplitude rectangular 
pulse s(r) of duration T h or its negative, depending on whether the information 
bit is 1 or 0, respectively. This may be seen from (13-2-7), where the coded 
chips {c,) within a single information bit are either all Is or 0s. The PN 
sequence multiplies either s(r) or -s(r). Thus, when the information bit is a 1, 
the L c PN chips generated by the PN generator are transmitted with the same 
polarity. On the other hand, when the information bit is a 0, the L c PN chips 
when multiplied by — s(f) are reversed in polarity. 

The demodulator for the repetition code, implemented as a correlator, is 
illustrated in Fig. 13-2-6. We observe that the integration interval in the 
integrator is the bit interval T b . Thus, the decoder for the repetition code is 
eliminated and its function is subsumed in the demodulator. 

Now let us qualitatively assess the effect of this demodulation process on 


Correlation-type demodulator for a 
repetition code. 
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the interference z(r). The multiplication of z(t) by the output of the PN 
generator, which is expressed as 

w(0 = 2 ( 2b > ~ 1 )p0 _ *X) 

i 

yields 

v(t) = w(t)z(t) 

The waveforms w(r) and z(t) are statistically independent random processes 
each with zero mean and autocorrelation functions 4> ww (t) and r), 
respectively. The product t>(f) is also a random process having an autocorrela- 
tion function equal to the product of with d>„(f). Hence, the power 

spectral density of the process t>(r) is equal to the convolution of the power 
spectral density of w(f) with the power spectral density of z(t)- 

The effect of convolving the two spectra is to spread the power in 
bandwidth. Since the bandwidth of w(t) occupies the available channel 
bandwith W, the result of convolution of the two spectra is to spread the power 
spectral density of z(t) over the frequency band of width W. If z(t) is a 
narrowband process, i.e., its power spectral density has a-width much less than 
W, the power spectral density of the process v(f) will occupy a bandwidth 
equal to at least W. 

The integrator used in the cross-correlation shown in Fig. 13-2-6 has a 
bandwidth approximately equal to 1 IT„. Since 1 /T b « W, only a fraction of the 
total interference power appears at the output of the correlator. This fraction is 
approximately equal to the ratio of bandwidths l/T b to W. That is, 

1 !T b _ 1 T c _ 1 
W WT b T b L~ 

In other words, the multiplication of the interference with the signal from the 
PN generator spreads the interference to the signal bandwidth W, and the 
narrowband integration following the multiplication sees only the fraction 1 /L c 
of the total interference. Thus, the performance of the uncoded DS spread 
spectrum system is enhanced by the processing gain L c . 

Linear Code Concatenated with a Binary Repetition Code As illustrated 
above, a binary repetition code provides a margin against an interference or 
jamming signal but yields no coding gain. To obtain an improvement in 
performance, we may use a linear (n lt k) block or convolutional code, where 
«i ~ kL c . One possibi.lty is to select n, <n and to repeat each code bit n 2 
times such that /i = n 1 /i 2 - Thus, we can construct a linear (n u k) code by 
concatenating the {n x ,k) code with a binary (n 2 , 1) repetition code. This may 
be viewed as a trivial form of code concatenation where the outer code is the 
(«,, k) codfe and the inner code is the repetition code. 

Since the repetition code yields no coding gain, the coding gain achieved by 
the combined code must reduce to that achieved by the (n x , k) outer code. It 
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is demonstrated that this is indeed the case. The coding gain of the overall 
combined code is 

k 

Rc*'m =~Wm, rn = 2, 3, .... 2* 
n 

But the weights {w„,} for the combined code may be expressed as 

= «2 W°m 

where {w”} are the weights of the outer code. Therefore, the coding gain of the 
combined code is 

R t w m = — n 2 w a m = - w° m = R>1, (13-2-41 ) 

n } n 2 n , 

which is just the coding gain obtained from the outer code. 

A coding gain is also achieved if the outer code is decoded using 

hard decisions. The probability of a bit error obtained with the ( n 2 , 1) 
repetition code (based on soft-decision decoding) is 


(13-2-42) 

Then the code word error probability for a linear («i,&) block code is 
upper-bounded as 

"i , , 

Z >'"(1-/0"'-'" (13-2-43) 

where t = L| (d min - 1 )J, or as 

Pm* X [4/>(l-p)r- ,/2 (13-2-44) 

m =2 

where the latter is a Chernoff bound. For an (n u k) binary convolutional code, 
the upper bound on the bit error probability is 

■x 

E faPiid) (13-2-45) 

d - f/tnx 

where P 2 (d) is defined by (8-2-28) for odd d and by (8-2-29) for even d. 

Concatenated Coding for DS Spread Spectrum Systems It is apparent 
from the above discussion that an improvement in performance can be 
obtained by replacing the repetition code by a more powerful code that will 
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yield a coding gain in addition to the processing gain. Basically, the objective in 
a DS spread spectrum system is to construct a long, low -rate code having a 
large minimum distance. This may be best accomplished by 1 ' using code 
concatenation. When binary PSK is used in conjunction with DS spread 
spectrum, the elements of a concatenated code word must be expressed in 
binary form. 

Best performance is obtained when soft-decision decoding is used on both 
the inner and outer codes. However, an alternative, which usually results in 
reduced complexity for the decoder, is to employ soft-decision decoding on the 
inner code and hard-decision decoding on the outer code. The expressions for 
the error rate performance of these decoding schemes depend, in part, on the 
type of codes (block or convolutional) selected for the inner and outer codes. 
For example, the concatenation of two block codes may be viewed as an 
overall long binary (n, k) block code having a performance given by (13-2-39). 
The performance of other code combinations may also be readily derived. For 
the sake of brevity, we shall not consider such code combinations. 


13-2-2 Some Applications of DS Spread Spectrum Signals 

In this subsection, we shall briefly consider the use of coded DS spread 
spectrum signals for three specific applications. One is concerned with 
providing immunity against a jamming signal. In the second, a communication 
signal is hidden in the background noise by transmitting the signal at a very 
low power level. The third application is concerned with accommodating a 
number of simultaneous signal transmissions on the same channel, i.e., 
CDMA. 


Antijamming Application In Section 13-2-1, we derived the error rate 
performance for a DS spread spectrum signal in the presence of either a 
narrow band or a wideband jamming signal. As examples to illustrate the 
performance of a digital communications system in the presence of a jamming 
signal, we shall select three codes. One is the Golay (24, 12), which is 
characterized by the weight distribution given in Table 8-1-1 and has a 
minimum distance d min = 8. The second code is an expurgated Golay (24, 11) 
obtained by selecting 2048 code words of constant weight 12. Of course this 
expurgated code is nonlinear. These two codes will be used in conjunction with 
a repetition code. The third code to be considered is a maximum-length 
shift-register code. 


The error rate performance of the Golay (24, 12) with soft-decision 
decoding is 


M 


+ 759q( 


16 W/R\ 


f 24W/R\ 






(13-2-46) 
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where W/R is the processing gain and J a JP av is the jamming margin. Since 
n — n,/i 2 = 12WIR and n, = 24, each coded bit is, in effect, repeated n 2 = 
W/2R times. For example, if W/R- 100 (a processing gain of 20 dB), the 
block length of the repetition code is n 2 = 50. 

If hard-decision decoding is used, the probability of error for a coded bit is 



(13-2-47) 


and the corresponding probability of a code word error is upper-bounded as 


p<*^ 2 ( 24 W'(i-p) 24_m 

m = 4 ' ^ ' 


(13-2-48) 


As an alternative, we may use the Chemoff bound for hard-decision decoding, 
which is 


^ 759[4p(l - p)] A + 2576[4p(l - p )) 6 
+ 759[4p(l - p)] 8 + [4p(l - p)] 12 (13-2-49) 


Figure 13-2-7 illustrates the performance of the Golay (24, 12) as a function of 
the jamming margin J av /P ay , with the processing gain as a parameter. The 
Chemoff bound was used to compute the error probability for hard-decision 
decoding. The error probability for soft-decision decoding is dominated by the 
term 


759(2 



and that for hard-decision decoding is dominated by the term 759[4p(l - p)] 4 . 
Hence, the coding gain for soft-decision decoding t is at most 101og4 = 6dB. 
We note that the two curves corresponding to W/R = 1000 (30 dB) are 
identical in shape to the ones for W/R = 100 (20 dB), except that the latter are 
shifted by 10 dB to the right relative to the former. This shift is simply the 
difference in processing gain between these two DS spread spectrum signals. 

The error rate performance of the expurgated Golay (24,11) is upper- 
bounded as 

F " ,:2o47Q (r5£) <i3 - 2 - 5o) 

for soft-decision decoding and as$ 

P M « 2047[4p(l -p)] 6 (13-2-51) 


t The coding gain is less than 6 dB due to the multiplicative factor of 759, which increases the 
error probability relative to the performance of the binary uncoded system. 

t We remind the reader that the union bound is not very tight for large signal sets. 
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FIGURE 13-2-7 



for hard-decision decoding, where p is given as 



(13-2-52) 


The performance characteristics of this code are also plotted in Fig. 13-2-7 for 
W/R = 100. We observe that this expurgated Golay (24,11) code performs 
about 1 dB better than the Golay (24, 12) code. 

Instead of using a block code concatenated with a low-rate (1 /n 2 ) repetition 
code, let us consider using a single low-rate code. A particularly suitable set of 
low-rate codes is the set of maximum-length shift-register codes described in 
Section 8-1-3. We recall that for this set of codes, 


(n,*) = (2 m -l 1 m) 
d mm = 2 m ~ l 


(13-2-53) 
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All code words except the all-zero word have an identical weight of 2'" 
Hence, the error rate for soft-decision decoding is upper-bounded ast 


^ V 


I2W/R ml 
IP 

'av/ 1 av ^ 


^ 1 / 


P v JJP,, 2 m -l / 


(13-2-54) 


For moderate values of m, R c d mm ~ ^ m and, hence, (13-2-54) may be expressed 
as 



«£ 2* exp 


( 


mW/R \ 

IV ' 


(13-2-55) 


Hence, the coding gain is at most 10 log \m. 

For example, if we select m = 10 then n = 2 10 - 1 - 1023. Since n = kW/R = 
mW/R, it follows that W/R = 102. Thus, we have a processing gain of about 
20 dB and a coding gain of 7 dB. This performance is comparable to that 
obtained with the expurgated Golay (24, 1 1) code. Higher coding gains can be 
achieved with larger values of m. 

If hard-decision decoding is used for the maximum-length shift-register 
codes, the error rate is upper-bounded by the Chernoff bound as 


Pm *= (Af “ l)[4p(l -p)] d ""‘ a = (2 m - l)[4p(l - p)f - 2 (13-2-56) 


where p is given as 


_ n ( I 2W/R ~ \ ,/ j 2W/R m \ 

P Q \VJJP av Rc ) Q \\JJP„2 m -\) 


(13-2-57) 


For m = 10, the code word error rate P M is comparable to that obtained with 
the expurgated Golay (24, 11) code for hard-decision decoding. 

The resits given above illustrate the performance that can be obtained with 
a single level of coding. Greater coding gains can be achieved with concaten- 
ated codes. 


t The M =2 m waveforms generated by a maximum-length shift-register code form a simplex set 
(see Problem 8-13). The exact expression for the error probability, given in Section 5-2-4, may be 
used foi large values of M, where the union bound becomes very loose. 
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Low-Detectability Signal Transmission In this application, the signal is 
purposely transmitted at a very low power level relative to the background 
channel noise and thermal noise that is generated in the front end of the 
receiver. If the DS spread spectrum signal occupies a bandwidth W and the 
spectral density of the additive noise is N 0 W/Hz, the average noise power in 
the bandwidth W is N av = WN 0 . 

The average received signal power at the intended receiver is P av . If we wish 
to hide the presence of the signal from receivers that are in the vicinity of the 
intended receiver, the signal is transmitted at a low power level such that 
Pav/N.v « 1. The intended receiver can recover the information-bearing signal 
with the aid of the processing gain and the coding gain. However, any other 
receiver that has no prior knowledge of the PN sequence is unable to take 
advantage of the processing gain and the coding gain. Hence, the presence of 
the information-bearing signal is difficult to detect. We say that the signal has a 
low probability of being intercepted (LPI) and it is called an LPI signal. 

The probability of error results given in Section 13-2-1 also apply to the 
demodulation and decoding of LPI signals at the intended receiver. 


Code Division Multiple Access The enhancement in performance ob- 
tained from a DS spread spectrum signal through the processing gain and 
«oding gain can be used to enable many DS spread spectrum signals to occupy 
the same channel bandwidth provided that each signal has its own distinct PN 
sequence. Thus, it is possible to have several users transmit messages 
simultaneously over the same channel bandwidth. This type of digital 
communication in which each user (transmitter-receiver pair) has a distinct PN 
code for transmitting over a common channel bandwidth is called either code 
division multiple access (CDMA) or spread spectrum multiple access (SSMA). 

In the demodulation of each PN signal, the signals from the other 
simultaneous users of the channel appear as an additive interference. The level 
of interference varies, depending on the number of users at any given time. A 
major advantage of CDMA is that a large number of users can be accommod- 
ated if each transmits messages for a short period of time. In such a multiple 
access system, it is relatively easy either to add new users or to decrease the 
number of users without disrupting the system. 

Let us determine the number of simultaneous signals that can be supported 
in a CDMA system.f For simplicity, we assume that all signals have identical 
average powers. .Thus, if there are N u simultaneous users, the desired 
signal-to-noise interference power ratio at a given receiver is 

F,y P JV 1 

/.v {N U -\)P„ N„- 1 (13-2-58) 


t in this section the interference from other users is treated as a random process. This is the 
case if there is no cooperation among the users. In Chapter 15 we consider CDMA transmission in 
which interference from other users is known and is suppressed by the receiver. 
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Hence, the performance for soft-decision decoding at the given receiver is 
upper-bounded as 


M 




(13-2-59) 

In this case, we have assumed that the interference from other users is 
gaussian. 

As an example, suppose that the desired level of performance (error 
probability of 10~ 6 ) is achieved when 


W/R 


in - 20 


Then the maximum number of users that can be supported in the CDMA 
system is 


v, WIR 

Nu ~~W Rcdn 


+ 1 


(13-2-60) 


If W/R - 100 and R c d min = 4, as obtained with the Golay (24,12) code, the 
maximum number is JV„ = 21. If W/f? = 1000 and tf r d min = 4, this number 
becomes N u = 201. 

In determining the maximum number of simultaneous users of the channel, 
we have implicitly assumed that the PN code sequences are mutually 
orthogonal and the interference from other users adds on a power basis only. 
However, orthogonality among a number of PN code sequences is not easily 
achieved, especially if the number of PN code sequences required is large. In 
fact, the selection of a good set of PN sequences for a CDMA system is an 
important problem that has received considerable attention in the technical 
literature. We shall briefly discuss this problem in Section 13-2-3. 


13-2-3 Effect of Pulsed Interference on DS Spread Spectrum 
Systems 

Thus far, we have considered the effect of continuous interference or jamming 
on a DS spread spectrum signal. We have observed that the processing gain 
and coding gain provide a means for overcoming the detrimental effects of this 
type of interference. However, there is a jamming threat that has a dramatic 
effect on the performance of a DS spread spectrum system. That jamming 
signal consists of pulses of spectrally flat noise that covers the entire signal 
bandwidth W. This is usually called pulsed interference or partial-time jamming. 

Suppose the jammer has an average power J„ in the signal bandwidth W. 
Hence Instead of transmitting continuously, the jammer transmits 

pulses at a power 7 av /a for a% of the time, i.e., the probability that the 
jammer is transmitting at a given instant is a. For simplicity, we assume that 
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FIGURE 13-2-8 


an interference pulse spans an integral number of signaling intervals and, thus, 
it affects an integral number of bits. When the jammer is not transmitting, the 
transmitted bits are assumed to be received error-free, and when the jammer is 
transmitting, the probability of error for an uncoded DS spread spectrum 
system is Q(\Z2a% h /J 0 ). Hence, the average probability of a bit error is 


Pi(a) = <xQ(V2a% b lJ 0 ) = aQ 



(13-2-61) 


The jammer selects the duty cycle a to maximize the error probability. On 
differentiating (13-2-61) with respect to a, we find that the worst-case pulse 
jamming occurs when 


a 


* 


0.71 

\ 


(%//o*0.71) 

(S 6 /y 0 <o.7i) 


(13-2-62) 


and the corresponding error probability is 


f 0 083 0.0S3/„/P„ 



(%//„> 0.71) 
(%//„< 0.71) 


(13-2-63) 


The error rate performance given by (13-2-61) for a = 1.0, 0.1, and 0.01 
along with the worst-case performance based on a* is plotted in Fig. 13-2-8. 


Performance of DS binary PSK with pulse 
jamming. 
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By comparing the error rate for continuous gaussian noise jamming with 
worst-case pulse jamming, we observe a large difference in performance, which 
is approximately 40 dB at an error rate of 10 6 . 

We should point out that the above analysis applies when the jammer pulse 
duration is equal to or greater than the bit duration. In addition, we should 
indicate that practical considerations may prohibit the jammer from achieving 
high peak power (small values of a). Nevertheless, the error probability given 
by (13-2-63) serves as an upper bound on the performance of the uncoded 
binary PSK in worst-case pulse jamming. Clearly, the performance of the DS 
spread spectrum system in the presence of such jamming is extremely poor. 

If we simply add coding to the DS spread spectrum system, the improve- 
ment over the uncoded system is the coding gain. Thus, % b /J 0 is reduced by the 
coding gain, which in most cases is limited to less than 10 dB. The reason for 
the poor performance is that the jamming signal pulse duration may be 
selected to affect many consecutive coded bits when the jamming signal is 
turned on. Consequently, the code word error probability is high due to the 
burst characteristics of the jammer. 

In order to improve the performance, we should interleave the coded bits 
prior to transmission over the channel. The effect of the interleaving, as 
discussed in Section 8-1-9, is to make the coded bits that are hit by the jammer 
statistically independent. 

The block diagram of the digital communication system that includes 
interleaving/deinterleaving is shown in Fig. 13-2-9. Also shown is the pos- 
sibility that the receiver knows the jammer state, i.e., that it knows when 
the jammer is on or off. Knowledge of the jammer state (called side 
information) is sometimes available from channel measurements of noise 
power levels in adjacent frequency bands. In our treatment, we consider two 


FIGURE 13-2-9 Block diagram of AJ communication system. 
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extreme cases, namely, no knowledge of the jammer state or complete 
knowledge of the jammer state. In any case, the random variable £ 
represertting the jammer state is characterized by the probabilities 


P(C = l)=a, P(£ = 0) = 1 — a 


When the jammer is on, the channel is modeled as an AWGN with power 
spectral density N 0 = J 0 /a = / av /aW; and when the jammer is off, there is no 
noise in the channel. Knowledge of the jammer state implies that the decoder 
knows when £ = 1 and when £ = 0, and uses this information in the 
computation of the correlation metrics. For example, the decoder may weight 
the demodulator output for each coded bit by the reciprocal of the noise power 
level in the interval. Alternatively, the decoder may give zero weight (erasure) 
to a jammed bit. 

First, let us consider the effect of jamming without knowledge of the jammer 
state. The interleaver/deinterleaver pair is assumed to result in statistically 
independent jammer hits of the coded bits. As an example of the performance 
achieved with coding, we cite the performance results from the paper of Martin 
and McAdam (1980). There the performance of binary convolutional codes is 
evaluated for worst-case pulse jamming. Both hard and soft-decision Viterbi 
decoding are considered. Soft decisions are obtained by quantizing the 
demodulator output to eight levels. For this purpose, a uniform quantizer is 
used for which the threshold spacing is optimized for the pulse jammer noise 
level. The quantizer plays the important role of limiting the size of the 
demodulator output when the pulse jammer is on. The limiting action ensures 
that any hit on a coded bit does not heavily bias the corresponding path 
metrics. 

The optimum duty cycle for the pulse jammer in the coded system is 
generally inversely proportional to the SNR, but its value is different from that 
given by (13-2-62) for the uncoded system. Figure 13-2-10 illustrates graphi- 
cally the optimal jammer duty cycle for both hard- and soft-decision decoding 
of the rate 1/2 convolutional codes. The corresponding error rate results for 
this worst-case pulse jammer are illustrated in Figs 13-2-11 and 13-2-12 for rate 
1/2 codes with constraint lengths 3 K =£ 9. For example, note that at 
Pi = 10 6 , the K = 7 convolutional code with soft-decision decoding requires 
&i>M) = 7.6dB, whereas hard-decision decoding requires 'tJJ Q = ] 1.7 dB. This 
4.1 dB difference in SNR is relatively large. With continuous gaussian noise, 
the corresponding SNRs for an error rate of 1(T 6 are 5 dB for soft-decision 
decoding and 7 dB for hard-decision decoding. Hence, the worst-case pulse 
jammer has degraded the performance by 2.6 dB for soft-decision decoding 
and by 4.7 dB for hard-decision decoding. These levels of degradation increase 
as the constraint length of the convolutional code is decreased. The important 
point, however, is that the loss in SNR due to jamming has been reduced from 
40 dB for the uncoded system to less than 5 dB for the coded system based on 
a K = 7, rate 1/2 convolutional code. 
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FIGURE 13-2-10 Optimal duty cycle for pulse jammer. [ From 
Marlin and McAdam U980). © 1980 IEEE ] 



A simpler method for evaluating the performance of a coded AJ com- 
munication system is to use the cutoff rate parameter R 0 as proposed by 
Omura and Levitt (1982). For example, with binary-coded modulation, the 
cutoff rate may be expressed as 

/? 0 = l-log(l +£>„) (13-2-64) 


FIGURE 13-2-11 Performance of rate 1/2 convolutional codes 
with hard-decision Viterbi decoding binary 
PSK with optimal pulse jamming. [From 
Martin and McAdam (1980). © 1980 IEEE ] 
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FIGURE 13-2-12 


Performance of rate 1/2 convolutional codes 
with soft-decision Viterbi decoding binary 
PSK with optimal pulse jamming. [From 
Martin and McAdam (1980). © 1980 IEEE.) 



where the factor D a depends on the channel noise characteristics and the 
decoder processing. Recall that for binary PSK in an AWGN channel and 
soft-decision decoding, 

D a = e~*< /N ° (13-2-65) 

where % is the energy per coded bit; and for hard-decision decoding, 

D a = V4p0 ^p) (13-2-66) 

where p is the probability of a coded bit error. Here, we have N 0 = J 0 . 

For a coded binary PSK, with pulse jamming, Omura and Levitt (1982) have 
shown that 

D a = ae for soft-decision decoding with 

knowledge of jammer state (13-2-67) 

D a = min {[a exp (A 2 %N 0 /a) + 1 - a] exp (~2A^.)} 

»o 

for soft-decision decoding with 
no knowledge of jammer state (13-2-68) 

D a = a V~4p(l - p) for hard-decision decoding with 

knowledge of the jammer state (13-2-69) 

D a = V4ap(l — ap ) for hard-decision decoding with 

no knowledge of the jammer state (13-2-70) 
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FIGURE 13-2-13 



(0) Soft-decision decoding in AWGN (a^f ) 

0 ) Soft-decision with jammer state information 
(2) Hard-decision with jammer state information 
(3} Soft-decision with no jammer state information 
(4) Hard-decision with no jammer state information 

Cutoff rate for coded DS binary PSK modulation. [From Otnura and Levitt (1982)- 0 1982 IEEE } 


where the probability of error for hard-decision decoding of binary PSK is 



The graphs for R 0 as a function of % C /N 0 are illustrated in Fig. 13-2-13 for 
the cases given above. Note that these graphs represent the cutoff rate for the 
worst-case value of a - a* that maximizes D a (minimizes R 0 ) for each value of 
%/N 0 . Furthermore, note that with soft-decision decoding and no knowledge 
of the jammer state, R 0 = 0. This situation results from the fact that the 
demodulator output is not quantized. 

The graphs in Fig. 13-2-13 may be used to evaluate the performance of 
coded systems. To demonstrate the procedure, suppose that we wish to 
determine the SNR required to achieve an error probability of 10 6 with coded 
binary PSK in worst-case pulse jamming. To be specific, we assume that we 
have a rate 1/2, K = 7 convolutional code. We begin with the performance of 
the rate 1/2, K = 1 convolutional code with soft-decision decoding in an 
AWGN channel. At P 2 ~ 10~ 6 , the SNR required is found from Fig. 8-2-21 to 
be 

= 5dB 

Since the code is rate 1/2, we have 


%/fV 0 = 2dB 
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Now, we go to the graphs in Fig. 13-2-13 and find that for the AWGN channel 
(reference system) with WJN 0 = 2 dB, the corresponding value of the cutoff 
rate is 

R 0 = 0.74 bits/symbol 


If we have another channel with different noise characteristics (a worst-case 
pulse noise channel) but with the same value of the cutoff rate R 0 , then the 
upper bound on the bit error probability is the same, i.e., 10~ 6 in this case. 
Consequently, we can use this rate to determine the SNR required for the 
worst-case pulse jammer channel. From the graphs in Fig. 13-2-13, we find that 


'10dB 

5 dB 
Jo 


for hard-decision decoding with 
no knowledge of jammer state 

for hard-decision decoding with 
knowledge of jammer state 


3 dB for soft-decision decoding with 
v knowledge of jammer state 


Therefore, the corresponding values of % b /J 0 for the rate 1/2, K = 7 convolu- 
tional are 13, 8, and 6 dB, respectively. 

This general approach may be used to generate error rate graphs for coded 
binary signals in a worst-case pulse jamming channel by using corresponding 
error rate graphs for the AWGN channel. The approach we describe above is 
easily generalized to M- ary coded signals as indicated by Omura and Levitt 
(1982). 

By comparing the cutoff rate for coded DS binary PSK modulation shown in 
Fig. 13-2-13, we note that for rates below 0.7, there is no penalty in SNR with 
soft-decision decoding and jammer state information compared with the 
performance on the AWGN channel (a = 1). On the other hand, at R 0 = 0.7, 
there is a 6dB difference in performance between the SNR in an AWGN 
channel and that required for hard-decision decoding with no jammer state 
information. At rates below 0.4, there is no penalty in SNR with hard-decision 
decoding if the jammer state is unknown. However, there is the expected 2 dB 
loss in hard-decision decoding compared with soft-decision decoding in the 
AWGN channel. 


13-2-4 Generation of PN Sequences 

The generation of PN sequences for spread spectrum applications is a topic 
that has received considerable attention in the technical literature. We shall 
briefly discuss the construction of some PN sequences and present a number of 
important properties of the autocorrelation and cross-correlation functions of 
such sequences. For a comprehensive treatment of this subject, the interested 
reader may refer to the book by Golomb (1967). 



CHAPTER IV SPREAD SPECTRUM SIGNALS FOR DIGITAL COMMUNICATIONS 725 


m stages 



FIGURE 13-2-14 General m-stage shift register with linear feedback. 


By far the most widely known binary PN sequences are the maximum- 
length shift-register sequences introduced in Section 8-1-3 in the context of 
coding and suggested again in Section 13-2-2 for use as low-rate codes. A 
maximum-length shift-register sequence, or m -sequence for short, has length 
n=2 m -\ bits and is generated by an m-stage shift register with linear 
feedback as illustrated in Fig. 13-2-14. The sequence is periodic with period n. 
Each period of the sequence contains 2 m l ones and 2 m ~ x - 1 zeros. 

In DS spread spectrum applications the binary sequence with elements {0, 1} 
is mapped into a corresponding sequence of positive and negative pulses 
according to the relation 

p,(t) = (2b, - 1 )p(t - iT) 

where p,-(t) is the pulse corresponding to the element b, in the sequence with 
elements {0, 1}. Equivalently, we may say that the binary sequence with 
elements {0, 1} is mapped into a corresponding binary sequence with elements 
{ — 1, 1}- We shall call the equivalent sequence with elements { — 1, 1} a bipolar 
sequence, since it results in pulses of positive and negative amplitudes. 

An important characteristic of a periodic PN sequence is its periodic 
autocorrelation function, which is usually defined in terms of the bipolar 
sequence as 

n 

<f>(j) = 2 - 1 )( 2 &, +/ - 1 ), 0 *£ / « - 1 ( 13 - 2 - 71 ) 

/ = i 

where n is the period. Clearly, $(;' + rn ) = <f>(j) for any integer value r. 

Ideally, a pseudo-random sequence should have an autocorrelation function 
with the property that d>(0) = n and <f>(j) = 0 for 1 =£ ; st n - 1. In the case of m 
sequences, the periodic autocorrelation function is 

oiyL-i) < 13 - 2 ' 72 > 

For large values of n, i.e., for long m sequences, the size of the off-peak values 
of relative to the peak value <£(/')/<£(0) = -1/n is small and, from a 
practical viewpoint, inconsequential. Therefore, m sequences are almost ideal 
when viewed in terms of their autocorrelation function. 
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In antijamming applications of PN spread spectrum signals, the period of 
the sequence must be large in order to prevent the jammer from learning the 
feedback connections of the PN generator. However, this requirement is 
impractical in most cases because the jammer can determine the feedback 
connections by observing only 2 m chips from the PN sequence. This 
vulnerability of the PN sequence is due to the linearity property of the 
generator. To reduce the vulnerability to a jammer, the output sequences from 
several stages of the shift register or the outputs from several distinct m 
sequences are combined in a nonlinear way to produce a nonlinear sequence 
that is considerably more difficult for the jammer to learn. Further reduction in 
vulnerability is achieved by frequently changing the feedback connections 
and/or the number of stages in the shift register according to some prear- 
ranged plan formulated between the transmitter and the intended receiver. 

In some applications, the cross-correlation properties of PN sequences are 
as important as the autocorrelation properties. For example, in CDMA, each 
user is assigned a particular PN sequence. Ideally, the PN sequences among 
users should ' be mutually orthogonal so that the level of interference 
experienced by any one user from transmissions of other users adds on a power 
basis. However, the PN sequences used in practice exhibit some correlation. 

To be specific, we consider the class of m sequences. It is -known (Sarwate 
and Pursley, 1980) that the periodic cross-correlation function between any 
pair of m sequences of the same period can have relatively large peaks. Table 
13-2-1 lists the peak magnitude </> max for the periodic cross-correlation between 
pairs of m sequences for 3 s m ^ 12. The table also shows the number of m 
sequences of length n~2 m ~\ for 3 *£ m < 12. As we can see, the number of 
m sequences of length n increases rapidly with m. We also observe that, for 
most sequences, the peak magnitude of the cross-correlation function is a 
large percentage of the peak value of the autocorrelation function. 

Such high values for the cross-correlations are undesirable in CDMA. 


TABLE 13-2-1 PEAK CROSS CORRELATION OF m SEQUENCES AND GOLD SEQUENCES 


Peak 

Number of cross-correlation 

m n=2 m -X m sequences 0) f(/n) f(m)/<M0) 


3 

7 

2 

4 

15 

2 

5 

31 

6 

6 

63 

6 

7 

127 

18 

8 

255 

16 

9 

511 

48 

10 

1023 

60 

11 

2047 

176 

12 

4095 

144 


5 

0.71 

5 

0.71 

9 

0.60 

9 

0.60 

11 

0.35 

9 

0.29 

23 

0.36 

17 

0.27 

41 

0.32 

17 

0.13 

95 

0.37 

33 

0.13 

113 

0.22 

33 

0.06 

383 

0.37 

65 

0.06 

287 

0.14 

65 

0.03 

1407 

0.34 

129 

0.03 
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Although it is possible to select a small subset of m sequences that have 
relatively smaller cross-correlation peak values, the number of sequences in the 
set is usually too small for CDMA applications. 

PN sequences with better periodic cross-correlation properties than m 
sequences have been given by Gold (1967, 1968) and Kasami (1966). They are 
derived from m sequences as described below. 

Gold and Kasami proved that certain pairs of m sequences of length n 
exhibit a three-valued cross-correlation function with values {-1, -t(/?i). 
t{m ) - 2}, where 



2 (- + ‘v 2 +1 (oddm ) 
2 ( "i + 2) / 2 + 1 ( evenm ) 


(13-2-73) 


For example, if m = 10 then f(10) = 2 6 + 1 = 65 and the three possible values of 
the periodic cross-correlation function are (-1, -65,63}. Hence the maximum 
cross-correlation for the pair of m sequences is 65, while the peak for the 
family of 60 possible sequences generated by a 10-stage shift register with 
different feedback connections is = 383 — about a sixfold difference in 
peak values. Two m sequences of length n with a periodic cross-correlation 
function that takes on the possible values {— 1, t(m) - 2} are called 
preferred sequences. 

From a pair of preferred sequences, say a = [a l a 2 . ■ ■ a n ] and b = 
[b ] b 2 ...b n ], we construct a set of sequences of length n by taking the 
modulo-2 sum of a with the n cyclicly shifted versions of b or vice versa. Thus, 
we obtain n new periodic sequences} with period n = 2 m - 1. We may also 
include the original sequences a and b and, thus, we have a total of n + 2 
sequences. The n + 2 sequences constructed in this manner are called Gold 
sequences. 


Example 13-2-4 

Let us consider the generation of Gold sequences of length n - 31 = 2 5 - 1. 
As indicated above for m = 5, the cross-correlation peak is 

f(5) = 2 3 + 1 = 9 

Two preferred sequences, which may be obtained from Peterson and 
Weldon (1972), are described by the polynomials 

gi(p)=P 5 +P 2 + 1 
g2(p)=p 5 +p 4 + p 2 + p + 1 


t An equivalent method for generating the n new sequences is to employ a shift register of . 
length 2m with feedback connections specified by the polynomial h(p) = g l (p)g 2 (p), where g,(p) 
and giip) are the polynomials that specify the feedback connections of the m -stage shift registers 
that generate the m sequences a and b. 
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FIGURE 13-2-15 


S\ t J>) = P S +?* ♦ • 



gi<p)=p i + p i +p'-+p + i 

Generation of Gold sequences of length 31. 


The shift registers for generating the two m sequences and the 
corresponding Gold sequences are shown in Fig. 13-2-15. In this case, there 
are 33 different sequences, corresponding to the 33 relative phases’ of the 
two m sequences. Of these, -31 sequences are non-maximal-!ength 
sequences. 


With the exception of the sequences a and b, the set of Gold sequences does 
not comprise maximum-length shift-register sequences of length n. Hence, 
their autocorrelation functions are not two-valued. Gold (1968) has shown that 
the cross-correlation function for any pair of sequences from the set of n + 2 
Gold sequences is three-valued with possible values {-1, -/(m), t{m) - 2}, 
where t(m ) is given by (13-2-73). Similarly, the off-peak autocorrelation 
function for a Gold sequence takes on values from the set {—1, — f(m), t(m) ~ 
2}. Hence, the off-peak values of the autocorrelation function are upper- 
bounded by /(m). 

The values of the off-peak autocorrelation function and the peak cross- 
correlation function, i.e., t(m), for Gold sequences is listed in Table 13-2-1. 
Also listed are the values normalized by d>(0). 

It is interesting to compare the peak cross-correlation value of Gold 
sequences with a known lower bound on the cross-correlation between any 
pair of binary sequences of period n in a set of M sequences. A lower bound 
developed by Welch (1974) for <f> mhx is 




V Mn — 1 


(13-2-74) 


which, for large values of n and M, is well approximated as Vn. For Gold 
sequences, n = 2 m - 1 and, hence, the lower bound is <ft m „ = 2 ma . This bound 
is lower by VI for odd m and by 2 for even m relative to = r(m) for Gold 
sequences. 
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A procedure similar to that used for generating Gold sequences will 
generate a smaller set of M = 2"' /2 binary sequences of period n = 2 m - 1 , 
where m is even. In this procedure, we begin with an m sequence a and we 
form a binary sequence b by taking every 2 ma + 1 bit of a. Thus, the sequence 
b is formed by decimating a by 2" ,/2 + 1. It can be verified that the resulting b is 
periodic with period 2" ,/2 -l. For example, if m =10, the period of a is 
ft = 1023 and the period of b is 31. Hence, if we observe 1023 bits of the 
sequence b, we shall see 33 repetitions of the 31-bit sequence. Now, by taking 
n =2 m — l bits of the sequences a and b, we form a new set of sequences by 
adding, modulo-2, the bits from a and the bits from b and all l m ' 2 - 2 cyclic 
shifts of the bits from b. By including a in the set, we obtain a set of 2 m/2 binary 
sequences of length n = 2'" - 1. These are called Kasami sequences. The 
autocorrelation and cross-correlation functions of these sequences take on 
values from the set {-1, ~(2"" 2 + 1), 2"' /2 - 1}. Hence, the maximum cross- 
correlation value for any pair of sequences from the set is 

<f>^ = 2" ,i2 +i (13-2-75) 

This value of <f> nax satisfies the Welch lower bound for a set of 2'” 2 sequences 
of length n = 2 m - 1. Hence, the Kasami sequences are optimal. 

Besides the well-known Gold and Kasami sequences, there are other binary 
sequences appropriate for CDMA applications. The interested reader may 
refer to the work of Scholtz (1979), Olsen (1977), and Sarwate and Purslev 
(1980). 

Finally, we wish to indicate that, although we have discussed the periodic 
cross-correlation function between pairs of periodic sequences, many practical 
CDMA systems may use information bit durations that encompass only 
fractions of a periodic sequence. In such cases, it is the partial-period 
cross-correlation between two sequences that is important. A number of 
papers deal with this problem, including those by Lindholm (1968), Wainberg 
and Wolf (1970), Fredricsson (1975), Bekir et al. (1978), and Pursley (1979). 

13-3 FREQUENCY-HOPPED SPREAD SPECTRUM 
SIGNALS 

In a frequency -hopped (FH) spread spectrum communications system the 
available channel bandwidth is subdivided into a large number of contiguous 
frequency slots. In any signaling interval, the transmitted signal occupies one 
or more of the available frequency slots. The selection of the frequency slot(s) 
in each signaling interval is made pseudo-randomly according to the output 
from a PN generator. Figure 13-3-1 illustrates a particular frequency-hopped 
pattern in the time-frequency plane. 

A block diagram of the transmitter and receiver for a frequency-hopped 
spread spectrum system is shown in Fig. 13-3-2. The modulation is usually 
either binary or M- ary FSK. For example, if binary FSK is employed, the 
modulator selects one of two frequencies corresponding to the transmission of 
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FIGURE 13-3-1 An example of a frequency-hopped (FH) pattern. Time interval 


either a 1 or a 0. The resulting FSK signal is translated in frequency by an 
amount that is determined by the output sequence from the PN generator, 
which, in turn, is used to select a frequency that is synthesized by the frequency 
synthesizer. This frequency is mixed with the output of the modulator and the 
resultant frequency-translated signal is transmitted over the channel. For 
example, m bits from the PN generator may be used to specify 2 m - 1 possible 
frequency translations. 

At the receiver, we have an identical PN generator, synchronized with the 
received signal, which is used to control the output of the frequency 
synthesizer. Thus, the pseudo-random frequency translation introduced at the 
transmitter is removed at the receiver by mixing the synthesizer output with 
the received signal. The resultant signal is demodulated by means of an FSK 
demodulator. A signal for maintaining synchronism of the PN generator with 
the frequency-translated received signal is usually extracted from the received 
signal. 

Although PSK modulation gives better performance than FSK in an 


FIGURE 13-3-2 Block diagram of a FH spread spectrum system. 
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FIGURE 13-3-3 



Block diagram of an independent tone FH spread spectrum syslem. 


AWGN channel, it is difficult to maintain phase coherence in the synthesis of 
the frequencies used in the hopping pattern and, also, in the propagation of the 
signal over the channel as the signal is hopped from one frequency to another 
over a wide bandwidth. Consequently, FSK modulation with noncoherent 
detection is usually employed with FH spread spectrum signals. 

In the frequency -hopping system depicted in Fig. 13-3-2, the carrier 
frequency is pseudo-randomly hopped in every signaling interval. The M 
information-bearing tones are contiguous and separated in frequency by 1/7^, 
where T c is the signaling interval. This type of frequency hopping is called 
block hopping. 

Another type of frequency hopping that is less vulnerable to some jamming 
strategies is independent tone hopping. In this scheme, the M possible tones 
from the modulator are assigned widely dispersed frequency slots. One method 
for accomplishing this is illustrated in Fig. 13-3-3. Here, the m bits from the PN 
generator and the k information bits are used to specify the frequency slots for 
the transmitted signal. 

The frequency-hopping rate is usually selected to be either equal to the 
(coded or uncoded) symbol rate or faster than that rate. If there are multiple 
hops per symbol, we have a fast-hopped signal. On the other hand, if the 
hopping is performed at the symbol rate, we have a slow-hopped signal. 

Fast frequency hopping is employed in AJ applications when it is necessary 
to prevent a type of jammer, called a follower jammer, from having sufficient 
time to intercept the frequency and retransmit it along with adjacent 
frequencies so as to create interfering signal components. However, there is a 
penalty incurred in subdividing a signal into several frequency-hopped ele- 
ments because the energy from these separate elements is combined non- 
coherently. Consequently, the demodulator incurs a penalty in the form of a 
noncoherent combining loss as described in Section 12-1. 

FH spread spectrum signals are used primarily in digital communications 
systems that require AJ projection and in CDMA, where many users share a 
common bandwidth. In most cases, a FH signal is preferred over a DS spread 
spectrum signal because of the stringent synchronization requirements 
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inherent in DS spread spectrum signals. Specifically, in a DS system, timing 
and synchronization must be established to within a fraction of the chip 
interval T c = 1/W. On the other hand, in an FH system, the chip interval is the 
time spent in transmitting a signal in a particular frequency slot of bandwidth 
B « W. But this interval is approximately 1/B, which is much larger than l/W. 
Hence the timing requirements in a FH system are not as stringent as in a PN 
system. 

In Sections 13-3-2 and 13-3-3, we shall focus on the AJ and CDMA 
applications of FH spread spectrum signals. First, we shall determine the error 
rate performance of an uncoded and a coded FH signal in the presence of 
broadband AWGN interference. Then we shall consider a more serious type of 
interference that arises in AJ and CDMA applications, called partial-band 
interference. The benefits obtained from coding for this type of interference are 
determined. We conclude the discussion in Section 13-3-3 with an example of 
an FH CDMA system that was designed for use by mobile users with a satellite 
serving as the channel. 


13-3-1 Performance of FH Spread Spectrum Signals in 
AWGN Channel 


Let us consider the performance of a FH spread spectrum signal in the 
presence of broadband interference characterized statistically as AWGN with 
power spectral density J 0 . For binary orthogonal FSK with noncoherent 
detection and slow frequency hopping (1 hop/bit), the probability of error, 
derived in Section 5-4-1, is 

P 2 = \e~^ a (13-3-1) 

where y b = % b /J 0 . On the other hand, if the bit interval is subdivided into L 
subintervals and FH binary FSK is transmitted in each subinterval, we have a 
fast FH signal. With square-law combining of the output signals from the 
corresponding matched filters for the L subintervals, the error rate perfor- 
mance of the FH signal, obtained from the results in Section 12-1, is 


L - 1 




1=0 


(13-3-2) 


where the SNR per bit is y b = £ b /jr 0 = Ly c , y c is the SNR per chip in the 
L-chip symbol, and 


«-s!Tr l ) 


(13-3-3) 


We recall that, for a given SNR per bit y b , the error rate obtained from 
(13-3-2) is larger than that obtained from (13-3-1). The difference in SNR for a 
given error rate and a given L is called the noncoherent combining loss, which 
was described and illustrated in Section 12-1. 

Coding improves the performance of the FH spread spectrum system by an 
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amount, which we call the coding gain, that depends on the code parameters. 
Suppose we use a linear binary {n, k ) block code and binary FSK modulation 
with one hop per coded bit for transmitting the bits. With soft-decision 
decoding of the square-law -demodulated FSK signal, the probability of a code 
word error is upper-bounded as 

M 

Pi{m) (13-3-4) 

where P 2 (m) is the error probability in deciding between the mth code word 
and the all-zero code word when the latter has been transmitted. The 
expression for P 2 {m) was derived in Section 8-1-4 and has the same form as 
(13-3-2) and (13-3-3), with L being replaced by w m and y b by y b R c w m , where 
w m is the weight of the mth code word and R c is the code rate. The product 
R c w m, which is not less than R c d min , represents the coding gain. Thus, we have 
the performance of a block coded FH system with slow frequency hopping in 
broadband interference. 

The probability of error for fast frequency hopping with n 2 hops per coded 
bit is obtained by reinterpreting the binary event probability P 2 (m) in (13-3-4). 
The n 2 hops per coded bit may be interpreted as a repetition code, which, 
when combined with a nontrivial (n t ,k) binary linear code having weight 
distribution yields an (n,n 2 , k) binary linear code with weight distribu- 
tion {n 2 w m }. Hence, P 2 {m) has the form given in (13-3-2), with L replaced by 
n 2 w m and y b by y b R c n 2 w m , where R c = k/n i n 2 . Note that y b R c n 2 w m = 
y bW m kln x , which is just the coding gain obtained from the nontrivial (n u k) 
code. Consequently, the use of the repetition code will result in an increase in 
tjie noncoherent combining loss. 

With hard-decision decoding and slow frequency hopping, the probability of 
a coded bit error at the output of the demodulator for noncoherent detection is 

P = { e -™*<' 2 (13-3-5) 

The code word error probability is easily upper-bounded, by use of the 
Chernoff bound, as 

Kp(l-g)r (13-3-6) 

m-2 

However, if fast frequency hopping is employed with n 2 hops per coded bit, 
and the square-law-detected outputs from the corresponding matched filters 
for the n 2 hops are added as in soft-decision decoding to form the two decision 
variables for the coded bits, the bit error probability p is also given by (13-3-2), 
with L replaced by n 2 and y b replaced by y b R c n 2 , where R c is the rate of the 
nontrivial (*„ k) code. Consequently, the performance of the fast FH system 
in broadband interference is degraded relative to the slow FH system by an 
amount equal to the noncoherent combining loss of the signals received from 
the n 2 hops. 
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We have observed that for both hard-decision and soft-decision decoding, 
the use of the repetition code in a fast-frequency-hopping system yields no 
coding gain. The only coding gain obtained comes from the (n u k) block code. 
Hence, the repetition code is inefficient in a fast FH system with noncoherent 
combining. A more efficient coding method is one in which either a single 
low-rate binary code or a concatenated code is employed. Additional improve- 
ments in performance may be obtained by using nonbinary codes in conjunc- 
tion with A/-ary FSK. Bounds on the error probability for this case may be 
obtained from the results given in Section 12-1. 

Although we have evaluated the performance of linear block codes only in 
the above discussion, it is relatively easy to derive corresponding performance 
results for binary convolutional codes. We leave as an exercise for the reader 
the derivation of the bit error probability for soft-decision Viterbi decoding 
and hard-decision Viterbi decoding of FH signals corrupted by broadband 
interference. 

Finally, we observe that 'S h , the energy per bit, can be expressed as 
% ~ P a v/R, where R is the information rate in bits per second and J u =J av /W. 
Therefore, y h may be expressed as 



W/R 


J IP 

J a\i 1 a’ 


( 13 - 3 - 7 ) 


In this expression, we recognize W/R as the processing gain and J >X IP. AX as the 
jamming margin for the FH spread spectrum signal. 


13-3-2 Performance of FH Spread Spectrum Signals in 
Partial-Band Interference 

The partial-band interference considered in this subsection is modeled as a 
zero-mean gaussian random process with a flat power spectral density over a 
fraction a of the total bandwidth W and zero elsewhere. In the region or 
regions where the power spectral density is nonzero, its value is <t> z .(f) - JJa, 
0< a « 1 This model of the interference may be applied to a jamming signal 
or to interference from other users in a FH CDMA system. 

Suppose that the partial-band interference comes from a jammer who may 
select a to optimize the effect on the communications system. In an uncoded 
pseudo-randomly hopped (slow-hopping) FH system with binary FSK modula- 
tion and noncoherent detection, the received signal will be jammed with 
probability a and it will not be jammed with probability 1 - a. When it is 
jammed, the probability of error is \ exp (- %a/7J^ and when it is not 
jammed, the demodulation is error-free. Consequently, the average probability 
of error is 

Pi(a) = exp (-^r) (13-3-8) 

where t h U 0 may also be expressed as ( VV7/?)/(/ av //> av ). 
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0 5 tO 15 ?0 25 30 35 

FIGURE 13-3-4 Performance of binary FSK with partial-band interference. SNR per bit. y k (dB) 


Figure 13-3-4 illustrates the error rate as a function of % b IJ n for several 
values of a. The jammer’s optimum strategy is to select the value of a that 
maximizes the error probability. By differentiating P 2 (a) and solving for the 
extremum with the restriction that 0 =£ a *£ 1, we find that 


a* = • % b /2J 0 2 W/R 

J (%//„< 2) 


The corresponding error probability for the worst-case partial-band jammer is 



(13-3-10) 


Whereas the error probability decreases exponentially for full-band jamming, 
we now find that the error probability decreases only inversely with for 
the worst-case partial-band jamming. This result is similar to the error rate 
performance of binary FSK in a Rayleigh fading channel (see Section 14-3) and 
to the uncoded DS spread spectrum system corrupted by worst-case pulse 
jamming (see Section 13-2-3). 

As we shall demonstrate below, signal diversity obtained by means of 
coding provides a significant improvement in performance relative to uncoded 
signals. This same approach to signal design is also effective for signaling over 
a fading channel, as we shall demonstrate in Chapter 14. 

To illustrate the benefits of diversity in a FH spread spectrum signal with 
partial-band interference, we assume that the same information symbol is 
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transmitted by binary FSK on L independent frequency hops. This may be 
accomplished by subdividing the signaling interval into L subintervals, as 
described previously for fast frequency hopping. After the hopping pattern is 
removed, the signal is demodulated by passing it through a pair of matched 
filters whose outputs are square-law-detected and sampled at the end of each 
subinterval. The square-law-detected signals corresponding to the L frequency 
hops are weighted and summed to form the two decision variables (metrics), 
which are denoted as Ut and lh_. 

When the decision variable U ] contains the signal components, U { and U 2 
may be expressed as 


Ux = 2 0* \2%. + N ik | 2 

* = l (13-3-11) 

£4=2/3* |A4d 2 

A- — I 


where {/3*} represent the weighting coefficients, % is the signal energy per chip 
in the L-chip symbol, and {N jk ) represent the additive gaussian noise terms at 
the output of the matched filters. 

The coefficients are optimally selected to prevent the jammer from 
saturating the combiner should the transmitted frequencies be successfully hit 
in one qr more hops. Ideally, fi k is selected to be equal to the reciprocal of the 
variance of the corresponding noise terms {N k }. Thus, the noise variance for 
each chip is normalized to unity by this weighting and the corresponding signal 
is also scaled accordingly. This means that when the signal frequencies on a 
particular hop are jammed, the corresponding weight is very small. In the 
absence of jamming on a given hop, the weight is relatively large. In practice, 
for partial-bound noise jamming, the weighting may be accomplished by use of 
an AGC having a gain that is set on the basis of noise power measurements 
obtained from frequency bands adjacent to the transmitted tones. This is 
equivalent to having side information (knowledge of jammer state) at the 
decoder. 

Suppose that we have broadband gaussian noise with power spectral density 
N 0 and partial-band interference, over aW of the frequency band, which is also 
gaussian with power spectral density J a /a. In the presence of partial-band 
interference, the second moments of the noise terms N lk and N 2k are 


*1= ^(|Af u j 2 ) = ^£(|/V 2t | 2 ) = 2g l .(/V 0 + ^) '(13-3-12) 

In this case, we select p k = l/a 2 k = (2« r (N 0 + J 0 /a)]~\ In the absence of 
partial-band interference, a 2 k = 2 % C N„ and, hence, p k = (2^ A/ 0 )“'. Note that p k 
is a random variable. 

An error occurs in the demodulation if U 2 > t/,. Although it is possible to 
determine the exact error probability, we shall resort to the Chernoff bound, 
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which yields a result that is much easier to evaluate and interpret. Specifically, 
the Chemoff (upper) bounds in the error probability is 


P 2 = P(U 2 - t/, > 0) * £{exp [v(t/ 2 - I/,)]} 


= ffjexp 


-vI, &m + N lk \ 2 

k = \ 


\N : 


2*1 


’<]) 


(13-3-13) 


where v is a variable that is optimized to yield the tightest possible bound. 

The averaging in (13-3-13) is performed with respect to the statistics of the 
noise components and the statistics of the weighting coefficients \fi k }, which are 
random as a consequence of the statistical nature of the interference. Keeping 
the {/3 k } fixed and averaging over the noise statistics first, we obtain 


pm = e 


ex P ( _v S P* |2% + M*| 2 + v^|3* |/Vz* I 2 ) j 
= 17 £[exp(-vft \2% + N u \ 2 )}E{exp (vp k |A 2 *| 2 )] 


k - 1 

<- 1 


= n r 

* = l A 


~^ expl 


+ 2v 


(13-3-14) 


Since the FSK tones are jammed with probability a, it follows that j8* = 
[2#(A 0 + 7 0 /a)]~ 1 with probability a and (2g c N 0 )~' with probability 1-a 
Hence, the Chemoff bound is 


* = 1 

Hr 


f a 

r -2%v 1 

1 — a 

' —2% c v 1 

' 

j 

ll — 4v 2 eXP 

.{N 0 + 7 0 /a)(l + 2v). 

+ 1— 4v 2CXP 

[n 0 (1+2v)J 


a 


- 4v' 


exp 


-2ff c v 


(Af 0 + y„/a)(l + 2v) 


+■ 


1 — a 
1 - 4v : 


exp L^TT^J} 


(13-3-15) 

The next step is to optimize the bound in (13-3-15) with respect to the 
variable v. In its present form, however, the bound is messy to manipulate. A 
significant simplification occurs if we assume that JJa » N 0 , which renders the 
second term in (13-3-15) negligible compared with the first Alternatively, we 
let N 0 = 0, so that the bound on P 2 reduces to 


a 

.1 — 4v' 


exp 


./o(l + 2v)J) 


(13-3-16) 


The minimum value of this bound with respect to v and the maximum with 
respect to a (worst-case partial-band interference) is easily shown to occur 
when a = 3 J 0 /% 1 and v = J. For these values of the parameters, (13-3-16) 
reduces to 


*«w)-(r-)‘-(-F- *- 7 <««-«> 

\ey c / \ y c / J 0 U 0 
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FIGURE 13-3-5 


where y c is the SNR per chip in the L-chip symbol. Equivalently, 


1-47 (JJP av ) 
W/R 


v- 


WIR 


(13-3-18) 


The result in (13-3-17) was first derived by Viterbi and Jacobs (1975). 

We observe that the probability of error for the worst-case partial-band 
interference decreases exponentially with an increase in the SNR per chip y c . 
This result is very similar to the performance characteristics of diversity 
techniques for Rayleigh fading channels (see Section 14-4). We may express 
the right-hand side of (13-3-17) in the form 


P 2 (L)^exp[-y b h(y c )} (13-3-19) 

where the function h(y c ) is defined as 

(13-3-20) 

A plot of h{ y c ) is given in Fig. 13-3-5. We observe that the function has a 
maximum value of i at y c = 4. Consequently, there is an optimum SNR per 
chip of 10 log y c - 6 dB. At the optimum SNR, the error rate is upper-bounded 
as 

P ,) = e' rft ' 4 (13-3-21) 

When we compare the error probability bound in (13-3-21) with the error 
probability for binary FSK in spectrally flat noise, which is given by (13-3-1), 
we see that the combined effect of worst-case partial-band interference and the 
noncoherent combining loss in the square-law combining of the L chips is 3 dB. 
We emphasize, however, that for a given % b IJ 0 , the loss is greater when the 
order of diversity is not optimally selected. 



Graph of the function 




FIGURE 13-3-6 
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Coding provides a means for improving the performance of the frequency- 
hopped system corrupted by partial-band interference. In particular, if a block 
orthogonal code is used, with M = 2* code words and Lth-order diversity per 
code word, the probability of a code word error is upper-bounded as 


/I 47\ t - / 1 47 \ L 

P M « (2* - 1 )P 2 (L) = (2* - !)(' y-j - (2* - (13-3-22) 


and the equivalent bit error probability is upper-bounded as 



(13-3-23) 


Figure 13-3-6 illustrates the probability of a bit error for L = 1, 2, 4, 8 and 


Performance of binary and octal FSK with /.-order diversity for a channel with worst-case 
partial-band interference. 
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k = 1, 3 With an optimum choice of diversity, the upper bound can be 
expressed as 

P h ^2 k 1 exp(-i/ry h ) = i exp[~/c( 3 y h - In 2)J (13-3-24) 

Thus, we have an improvement in performance by an amount equal to 
1 0 log [A:( i -2.77/y,,)]. For example, if y h - 10 and k = 3 (octal modulation) 
then the gain is 3.4dB, while if k =5 then the gain is 5.6dB. 

Additional gains can be achieved by employing concatenated codes in 
conjunction with soft-decision decoding. In the example below, we employ a 
dual-/c convolutional code as the outer code and a Hadamard code as the inner 
code on the channel with partial-band interference. 


Example 13-3-1 

Suppose we use a Hadamard H(n, k) constant weight code with on-off 
keying (OOK) modulation for each code bit. The minimum distance of the 
code is d mm = and, hence, the effective order of diversity obtained with 
OOK modulation is \d min = \n. There are I/i frequency-hopped tones 
transmitted per code word. Hence. 

k 

y, = T y» = 2R. Jb (13-3-25) 

in 

when this code is used alone. The bit error rate performance for 

soft-decision decoding of these codes for the partial-band interference 

channel is upper-bounded as 

( 1 47 \'"' 4 

~) (13-3-26) 

TV 

Now, if a Hadamard ( n , k) code is used as the inner code and a rate 1/2 
dual -k convolutional code (see Section 8-2-6) is the outer code, the bit error 
performance in the presence of worst-case partial-band interference is (see 
(8-2-40)) 

2*’ 1 ’ 2 k 1 x 

2 Pn,P2(2md mm ) = ]£ p m p 2 (\mn) (13-3-27) 

L Z 1 m = 4 

where P 2 {L) is given by (13-3-17) with 

k 

TV = “ n = R<y b 

rt 


(13-3-28) 
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FIGURE 13-3-7 Performance of dual-ff codes concatenated with Hadamard 8 9 10 ll 12 13 14 15 

codes for a channel with worst-case partial-band interference. SNR per bit, Y t (dB) 


Figure 13-3-7 illustrates the performance of the dual-A: codes for k = 5, 4, 
and 3 concatenated with the Hadamard H{ 20, 5), H(16, 4), and H( 12, 3) 
codes, respectively. 

In the above discussion, we have focused on soft-decision decoding. On the 
other hand, the performance achieved with hard-decision decoding is sig- 
nificantly (several decibels) poorer than that obtained with soft-decision 
decoding. In a concatenated coding scheme, however, a mixture involving 
soft-decision decoding of the inner code and hard-decision decoding of the 
outer code represents a reasonable compromise between decoding complexity 
and performance. 

Finally, we wish to indicate that another serious threat in a FH spread 
spectrum system is partial -band multitone jamming. This type of interference is 
similar in effect to partial-band spectrally flat noise jamming. Diversity 
obtained through coding is an effective means for improving the performance 
of the FH system. An additional improvement is achieved by properly 
weighting the demodulator outputs so as to suppress the effects of the jammer. 

13-3-3 A CDMA System Based on FH Spread Spectrum 
Signals 

fn Section 13-2-2, we considered a CDMA system based on use of DS spread 
spectrum signals. As previously indicated, it is also possible to have a CDMA 
system based on FH spread spectrum signals. Each transmitter-receiver pair in 
such a system is assigned its owi> pseudo-random frequency-hopping pattern. 
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Aside from this distinguishing feature, the transmitters and receivers of ail the 
users may be identical in that they may have identical encoders, decoders, 
modulators, and demodulators. 

CDMA systems based on FH spread spectrum signals are particularly 
attractive for mobile (land, air, sea) users because timing requirements are not 
as stringent as in a PN spread spectrum signal. In addition, frequency synthesis 
techniques and associated hardware have been developed that make it possible 
to frequency-hop over bandwidths that are significantly larger than those 
currently possible with DS spread spectrum systems. Consequently, larger 
processing gains are possible with FH. The capacity of CDMA with FH is also 
relatively high. Viterbi (1978) has shown that with dual-/r codes and M - ary 
FSK modulation, it is possible to accomodate up to %W/R simultaneous users 
who transmit at an information rate R bits/s over a channel with bandwidth W. 

One of the earliest CDMA systems based on FH coded spread spectrum 
signals was built to provide multiple-access tactical satellite communications 
for small mobile (land, sea, air) terminals each of which transmitted relatively 
short messages over the channel intermittently. The system was called the 
Thctical Transmission System (TATS) and it is described in a paper by 
Drouilhet and Bernstein (1969). 

An octal Reed-Solomon (7,2) code is used in the TATS system. Thus, two 
3 bit information symbols from the input to the encoder are used to generate a 
seven-symbol code word. Each 3 bit coded symbol is transmitted by means of 
octal FSK modulation. The eight possible frequencies are spaced 1/7^. Hz 
apart, where T c is the time (chip) duration of a single frequency transmission. 
In addition to the seven symbols in a code word, an eighth symbol is included. 
That symbol and its corresponding frequency are fixed and transmitted at the 
beginning of each code word for the purpose of providing timing and 
frequency synchronization! at the receiver. Consequently, each code word is 
transmitted in 8 T t s. 

TATS was designed to transmit at information rates of 75 and 2400 bits/s. 
Hence, 7). = 10 ms and 312.5 /ts, respectively. Each frequency tone corres- 
ponding to a code symbol is frequency-hopped. Hence, the hopping rate is 
100 hops/s at the 75 bits/s rate and 3200 hops/s at the 2400 bits/s rate. 

There are M = 2 6 = 64 code words in the Reed-Solomon (7, 2) code and the 
minimum distance of the code is d m j n = 6. This means that the code provides an 
effective order of diversity equal to 6. 

At the receiver, the received signal is first dehopped and then demodulated 
by passing it through a parallel bank of eight matched filters, where each filter 
is tuned to one of the eight possible frquencies. Each filter output is 
envelope-detected, quantized to 4 bits (one of 16 levels), and fed to the 
decoder. The decoder takes the 56 filter outputs corresponding to the 


t Since mobile users are involved, there is a Doppler frequency offset associated with 
transmission. This frequency offset must be tracked and compensated for in the demodulation of 
the signal. The sync symbol is used for this purpose. 
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reception of each seven-symboi code word and forms 64 decision variables 
corresponding to the 64 possible code words in ihe (7,2) code by linearly 
combining the appropriate envelope detected outputs. A decision is made in 
favor of the code word having the largest decision variable. 

By limiting the matched filter outputs to 16 levels, interference (crosstalk) 
from other users of the channel causes a relatively small loss in performance 
(0.75 dB with strong interference on one chip and 1.5 dB with strong 
interference on two chips out of the seven). The AGC used in TATS has a 
time constant greater than the chip interval T l t so that no attempt is made to 
perform optimum weighting of the demodulator outputs as described in 
Section 13-3-2. 

The derivation of the error probability for the TATS signal in AWGN and 
worst-case partial-band interference is left as an exercise for the reader 
(Problems 13-23 and 13-24). 


13-4 OTHER TYPES OF SPREAD SPECTRUM 
SIGNALS 

DS and FH are the most common forms of spread spectrum signals used in 
practice. However, other methods may be used to introduce pseudo- 
randomness in a spread spectrum signal. One method, which is analogous to 
FH, is time hopping (TH). In TH, a time interval, which is selected to be much 
larger than the reciprocal of the information rate, is subdivided into a large 
number of time slots. The coded information symbols are transmitted in a 
pseudo-randomly selected time slot as a block of one or more code words. PSK 
modulation may be used to transmit the coded bits. 

For example, suppose that a time interval T is subdivided into 1000 time 
slots of width 771000 each. With an information bit rate of R bits/s, the 
number of bits to be transmitted in T s is RT. Coding increases this number to 
R 77 /?, bits, where R , is the coding rate. Consequently, in a time interval of 
777000 s, we must transmit RT/R, bits. If binary PSK is used as the 
modulation method, the bit rate is 1000/?//?, and the bandwidth required is 
approximately W - 1000/?//?,. 

A block diagram of a transmitter and a receiver for a TH spread spectrum 
system is shown in Fig. 13-4-1. Due to the burst characteristics of the 
transmitted signal, buffer storage must be provided at the transmitter in a TH 
system, as shown in Fig. 13-4-1. A buffer may also be used at the receiver to 
provide a uniform data stream to the user. 

Just as partial-band interference degrades an uncoded FH spread spectrum 
system, partial-time (pulsed) interference has a similar effect on a TH spread 
spectrum system. Coding and interleaving are effective means for combatting 
this type of interference, as we have already demonstrated for FH and DS 
systems. Perhaps the major disadvantage of a TH system is the stringent timing 
requirements compared not only with FH but, also, with DS. 

Other types of spread spectrum signals can be obtained by combining DS, 
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Output 


FIGURE 13-4-1 Block diagram of time-hopping (TH) spread spectrum system. 


FH, and TH. For example, we may have a hybrid DS/FH, which means that a 
PN sequence is used in combination with frequency hopping. The signal 
transmitted on a single hop consists of a DS spread spectrum signal which is 
demodulated coherently. However, the received signals from different hops are 
combined noncoherently (envelope or square-law combining). Since coherent 
detection is performed within a hop, there is an advantage obtained relative to 
a pure FH system. However, the price paid for the gain in performance is an 
increase in complexity, greater cost, and more stringent timing requirements. 

Another possible hybrid spread spectrum signal is DS/TH. This does not 
seem to be as practical as DS/FH. primarily because of an increase in system 
complexity and more stringent timing requirements. 


13-5 SYNCHRONIZATION OF SPREAD SPECTRUM 
SYSTEMS 

Time synchronization of the receiver to the received spread spectrum signal 
may be separated into two phases. There is an initial acquisition phase and a 
tracking phase after the signal has been initially acquired. 

Acquisition In a direct sequence spread spectrum system, the PN code 
must be time-synchronized to within a small fraction of the chip interval 
T c = l/W. The problem of initial synchronization may be viewed as one in 
which we attempt to synchronize in time the receiver clock to the transmitter 
clock. Usually, extremely accurate and stable time clocks are used in spread 
spectrum systems. Consequently, accurate time clocks result in a reduction of 
the time uncertainty between the receiver and the transmitter. However, there 
is always an initial timing uncertainty due to range uncertainty between the 
transmitter and the receiver. This is especially a problem when communication 
is taking place between two mobile users. In any case, the usual procedure for 
establishing initial synchronization is for the transmitter to send a known 
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pseudo-random daia sequence to the receiver. The receiver is continuously in a 
search mode looking for this sequence in order to establish initial 
synchronization. 

Let us suppose that the initial timing uncertainty is T u and the chip duration 
is T c . If initial synchronization is to take place in the presence of additive noise 
and other interference, it is necessary to dwell for T d = NT C in order to test 
synchronism at each time instant. If we search over the time uncertainty 
interval in (coarse) time steps of {T c . then the time required to establish initial 
synchronization is 

— (13-5-1) 

2‘c 

Clearly, the synchronization sequence transmitted to the receiver must be at 
least as long as 2 NT C in order for the receiver to have sufficient time to perform 
the necessary search in a serial fashion. 

In principle, matched filtering or cross-correlation are optimum methods for 
establishing initial synchronization. A filter matched to the known data 
waveform generated from the known pseudo-random sequence continuously 
looks for exceedence of a predetermined threshold. When this occurs, initial 
synchronization is established and the demodulator enters the “data receive" 
mode. 

Alternatively, we may use a sliding correlator as shown in Fig. 13-5-1. The 
correlator cycles through the time uncertainty, usually in discrete time intervals 
of \T C , and correlates the received signal with the known synchronization 
sequence. The cross-correlation is performed over the time interval NT C (A 
chips) and the correlator output is compared with a threshold to determine if 
the known signal sequence is present. If the threshold is not exceeded, the 
known reference sequence is advanced in time by |T t s and the correlation 
process is repeated. These operations are performed until a signal is detected 
or until the search has been performed over the time uncertainty interval T„. In 
the latter case, the search process is then repeated. 

A similar process may also be used for FH signals. In this case, the problem 
is to synchronize the PN code that controls the hopped frequency pattern. To 
accomplish this initial synchronization, a known frequency hopped signal is 


FIGURE 13-5*1 A sliding correlator for DS signal acquisition. 
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FIGURE 13 - 5-2 



System for acquisition of a FH signal. 


transmitted to the receiver. The initial acquisition system at the receiver looks 
for this known FH signal pattern. For example, a bank of matched filters tuned 
to the transmitted frequencies in the known pattern may be employed. Their 
outputs must be properly delayed, envelope- or square-law-detected, weighted, 
if necessary, and added (noncoherent integration) to produce the signal output 
which is compared with a threshold. A signal present is declared when the 
threshold is exceeded. The search process is usually performed continuously in 
time until a threshold is exceeded. A block diagram illustrating this signal 
acquisition scheme is given in Fig. 13-5-2. As an alternative, a single 
matched-filter-envelope detector pair may be used, preceded by a frequency- 
hopping pattern generator and followed by a post-detection integrator and a 
threshold detector. This configuration, shown in Fig. 13-5-3, is based on a serial 
search and is akin to the sliding correlator for DS spread spectrum signals. 

The sliding correlator for the DS signals or its counterpart shown in Fig. 
13-5-3 for FH signals basically perform a serial search that is generally 
time-consuming. As an alternative, one may introduce some degree of 
parallelism by having two or more such correlators operating in parallel and 
searching over nonoverlapping time slots. In such a case, the search time is 
reduced at the expense of a more complex and costly implementation. Figure 
13-5-2 represents such a parallel realization for the FH signals. 

During the search mode, there may be false alarms that occur at the 
designed false alarm rate of the system. To handle the occasional false alarms, 
it is necessary to have an additional method or circuit that checks to confirm 
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Clock 



FIGURE 13-5-3 Alternative system for acquisition of a FH signal. 


that the received signal at the output of the correlator remains above the 
threshold. With such a detection strategy, a large noise pulse that causes a false 
alarm will cause only a temporary exceedence of the threshold. On the other 
hand, when a signal is present, the correlator or matched filter output will stay 
above the threshold for the duration of the transmitted signal. Thus, if 
confirmation fails, the search is resumed. 

Another initial search strategy, called a sequential search, has been 
investigated by Ward (1965, 1977). In this method, the dwell time at each delay 
in the search process is made variable by employing a correlator with a 
variable integration period whose (biased) output is compared with two 
thresholds. Thus, there are three possible decisions: 

1 if the upper threshold is exceed by the correlator output, initial 
synchronization is declared established; 

2 if the correlator output falls below the lower threshold, the signal is 
declared absent at that delay and the search process resumes at a different 
delay; 

3 if the correlator output falls between the two thresholds, the integration 
time is increased by one chip and the resulting output is compared with the two 
thresholds again. 

Hence, steps 1, 2, and 3 are repeated for each chip interval until the correlator 
output either exceeds the upper threshold or falls below the lower threshold. 
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FIGURE 13-5-4 Initial search for Doppler frequency offset in a DS system. 


The sequential search method falls in the class of sequential estimation 
methods proposed by Wald (1947), which are known to result in a more 
efficient search in the sense that the average search time is minimized. Hence, 
the search time for a sequential search is less than that for the fixed dwell time 
integrator. 

In the above discussion, we have considered only time uncertainty in 
establishing initial synchronization. However, another aspect of initial synchro- 
nization is frequency uncertainty. If the transmitter and/or the receiver are 
mobile, the relative velocity between them results in a Doppler frequency shift 
in the received signal relative to the transmitted signal. Since the receiver does 
not usually know the relative velocity, a priori, the Doppler frequency shift is 
unknown and must be determined by means of a frequency search method. 
Such a search is usually accomplished in parallel over a suitably quantized 
frequency uncertainty interval and serially over the time uncertainty interval. 
A block diagram of this scheme is shown in Fig. 13-5-4. Appropriate Doppler 
frequency search methods can also be devised for FH signals. 

Tracking Once the signal is acquired, the initial search process is stopped 
and fine synchronization and tracking begins. The tracking maintains the PN 
code generator at the receiver in synchronism with the incoming signal. 
Tracking includes both fine chip synchronization and, for coherent demodula- 
tion, carrier phase tracking. 

The commonly used tracking loop for a DS spread spectrum signal is the 
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FIGURE 13-5-5 Delay-locked loop (DLL) for PN code tracking. 


delay-locked loop (DLL), which is shown in Fig. 13-5-5. In this tracking loop, 
the received signal is applied to two multipliers, where it is multiplied by two 
outputs from the local PN code generator, which are delayed relative to each 
other by an amount 25 T c . Thus, the product signals are the cross- 
correlations between the received signal and the PN sequence at the two values 
of delay. These products are bandpass-filtered and envelope- (or square-law ) 
detected and then subtracted. This difference signal is applied to the loop filter 
that drives the voltage controlled clock (VCC). The VCC serves as the clock 
for the PN code signal generator. 

If the synchronism is not exact, the filtered output from one correlator will 
exceed the other and the VCC will be appropriately advanced or delayed. At 
the equilibrium point, the two filtered correlator outputs will be equally 
displaced from the peak value, and the PN code generator output will be 
exactly synchronized to the received signal that is fed to the demodulator. We 
observe that this implementation of the DLL for tracking a DS signal is 
equivalent to the early-late gate bit tracking synchronizer previously discussed 
in Section 6-3-2 and shown in Fig. 6-3-5. 

An alternative method for time tracking a DS signal is to use a tau-dither 
bop (TDL), illustrated by the block diagram in Fig. 13-5-6. The TDL employs 
a single “arm” instead of the two “arms” shown in Fig. 13-5-5. By providing a 
suitable gating waveform, it is possible to make this “single-arm” implementa- 
tion appear to be equivalent to the “two-arm” realization. In this case, the 
cross-correlation is regularly sampled at two values of delay, by stepping the 
code dock forward or backward in time by an amount 8. The envelope of the 
cross-correlation that is sampled at ±5 has an amplitude modulation whose 
phase relative to the tau-dither modulator determines the sign of the tracking 
error. 


> 

k 
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FIGURE 13-5-6 Tau-dither loop (TDL). 


A major advantage of the TDL is the less costly implementation resulting 
from elimination of one of the two arms that are employed in the conventional 
DLL. A second and less apparent advantage is that the TDL does not suffer 
from performance degradation that is inherent in the DLL when the amplitude 
gain in the two arms is not properly balanced. 

The DLL (and its equivalent, the TDL) generate an error signal by 
sampling the signal correlation function at ±5 off the peak as shown in Fig. 
13-5-7(a). This generates an error signal as shown in Fig. 13-5-7(6). The 
analysis of the performance of the DLL is similar to that for t,he phase-locked 
loop (PLL) carried out in Section 6-3. If it were not for the envelope detectors 
in the two arms of the DLL, the loop would resemble a Costas loop. In 
general, the variance of the time estimation error in the DLL is inversely 
proportional to the loop SNR, which depends on the input SNR to the loop 
and the loop bandwidth. Its performance is somewhat degraded as in the 
squaring PLL by the nonlinearities inherent in the envelope detectors, but this 
degradation is relatively small. 


FIGURE 13-5-7 Autocorrelation function and tracking error signal for DLL. 
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FIGURE 13-i-8 



(«) Tracking loop for FH signals 
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( b ) Wavefrom for tracking an FH signal 

Trracking meihod for FH signals. [From Pickholiz et al. (1982). ©2982 IEEE ) 


A typical tracking technique for FH spread spectrum signals is illustrated in 
Fig. 13-5-8(a). This method is also based on the premise that, although initial 
acquisition has been achieved, there is a small timing error between the 
received signal and the receiver clock. The bandpass filter is tuned to a single 
intermediate frequency and its bandwidth is of the order of MT C , where T t is 
the chip interval. Its output is envelope-detected and then multiplied by the 
clock signal to produce a three-level signal, as shown, in Fig. 13-5-8(A>), which 
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drives the loop filter. Note that when the chip transitions from the locally 
generated sinusoidal waveform do not occur at the same time as the transitions 
in the incoming signal, the output of the loop filter will be either negative or 
positive, depending on whether the VCC is lagging or advanced relative to the 
timing of the input signal. This error signal from the loop filter will provide the 
control signal for adjusting the VCC timing signal so as to drive the frequency 
synthesized pulsed sinusoid to proper synchronism with the received signal. 


13-6 BIBLIOGRAPHICAL NOTES AND REFERENCES 

The introductory treatment of spread spectrum signals and their performance 
that we have given in this chapter is necessarily brief. Detailed and more 
specialized treatments of signal acquisition techniques, code tracking methods, 
and hybrid spread spectrum systems, as well as other general topics on spread 
spectrum signals and systems, can be found in the vast body of technical 
literature that now exists on the subject. 

Historically, the primary application of spread spectrum communications 
has been in the development of secure (AJ) digital communication systems for 
military use. In fact, prior to 1970, most of the work on the design and 
development of spread spectrum communications was classified. Since then, 
this trend has been reversed. The open literature now contains numerous 
publications on all aspects of spread spectrum signal analysis and design. 
Moreover, we have recently seen publications dealing with the application of 
spread spectrum signaling techniques to commercial communications such as 
interoffice radio communications (see Pahlavan, 1985) and mobile-user radio 
communications (see Yue, 1983). 

A historical perspective on the development of spread spectrum com- 
munication systems covering the period 1920-1960 is given in a paper by 
Scholtz (1982). Tutorial treatments focusing on the basic concepts are found in 
the papers by Scholtz (1977) and Pickholtz et al. (1982). These papers also 
contain a large number of references to previous work. In addition, there are 
two papers by Viterbi (1979, 1985) that provide a basic review of the 
performance characteristics of DS and FH signaling techniques. 

Comprehensive treatments of various aspects of analysis and design of 
spread spectrum signals and systems, including synchronization techniques are 
now available in the texts by Simon et al. (1985), Ziemer and Peterson (1985), 
and Holmes (1982). In addition to these texts, there are several special issues 
of the IEEE Transactions on Communications devoted to spread spectrum 
communications (August 1977 and May 1982) and the IEEE Transactions on 
Selected Areas in Communication (September 1985, May 1989, May 1990, and 
June 1993). These issues contain a collection of papers devoted to a variety of 
topics, including multiple access techniques, synchronization techniques, and 
performance analyses with various types of interference. A number of 
important papers that have been published in IEEE journals have also been 
reprinted in book form by the IEEE Press (Dixon, 1976; Cook et al. 1983). 
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FIGURE P13-2 
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Finally, we recommend the book by Golomb (1967) as a basic reference on 
shift register sequences for the reader who wishes to delve deeper into this 
topic. 


PROBLEMS 


13-1 Following the procedure outlined in Example 1 3-2-2, determine the error rate 
performance of a DS spread spectrum system in the presence of CW jamming 
when the signal pulse is 


g(0 = 



0 « / T 


13-2 The sketch in Fig. P13-2 illustrates the power spectral densities of a PN spread 
spectrum signal and narrowband interference in an uncoded (trivial repetition 
code) digital communication system. Referring to Fig. 13-2-6. which shows the 
demodulator for this signal, sketch the (approximate) spectral characteristics of 
the signal and the interference after the multiplication of r(r) with the output of 
the PN generator. Determine the fraction of the total interference that appears at 
the output of the correlator when the number of PN chips per bit is L, . 

13-3 Consider the concatenation of a Reed-Solomon (31,3) (q = 32-ary alphabet) as 
the outer code with a Hadamard (16,5) binary code as the inner code in a DS 
spread spectrum system. Assume that soft-decision decoding is performed on both 
codes. Determine an upper (union) bound on the probability of a bit error based 
on the minimum distance of the concatenated code. 

13-4 The Hadamard (n,k) =(!'", m + \) codes are low-rate codes with d n „ n = 2" '. 
Determine the performance of this class of codes for DS spread, spectrum signals 
with binary PSK modulation and either soft-decision or hard-decision decoding. 

13-5 A rate 1/2 convolutional code with = 10 is used to encode a data sequence 
occurring at a rate of 1000 bits/s. The modulation is binary PSK. The DS 
spread-spectrum sequence has a chip rate of 10 MHz. 
a Determine the coding gain, 
b Determine the processing gain. 

c Determine the jamming margin assuming an % h /J t , = 10. 

13-6 A total of 30 equal-power users are to share a common communication channel by 
CDMA. Each user transmits information at a rate of 10 kbits/s via DS spread- 
spectrum and binary PSK. Determine the minimum chip rate to obtain a bit error 
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probability of I0~ 5 . Additive noise at the receiver may be ignored in this 
computation. 

13-7 A CDMA system is designed based on DS spread spectrum with a processing gain 
of 1000 and binary PSK modulation. Determine the number of users if each user 
has equal power and the desired level of performance is an error probability of 
10 *. Repeat the computation if the processing gain is changed to 500. 

13-8 A DS spread-spectrum system transmits at a rate of 1000 bits/s in the presence of 
a tone jammer. The jammer power is 20 dB greater than the desired signal and the 
required %/J„ to achieve satisfactory performance is 10 dB. 
a Determine the spreading bandwidth required to meet the specifications, 
b If the jammer is a pulse jammer, determine the pulse duty cycle that results in 
worst-case jamming and the corresponding probability of error. 

13-9 A CDMA system consists of 15 equal-power users that transmit information at a 
rate of 10 000 bits/s, each using a DS spread spectrum signal operating at a chip 
rate of 1 MHz. The modulation is binary PSK. 

» Determine the %,U„. where /, is the spectral density of the combined 
interference. 

b What is the processing gain? 

c How much should the processing gain be increased to allow for doubling the 
number of users without affecting the output SNR? 

13-10 A DS binary PSK spread spectrum signal has a processing gain of 500. What is the 
jamming margin against a continuous-tone jammer if the desired error probability 
is 10 '? 

13-11 Repeat Problem 13-10 if the jammer is a pulsed-noise jammer with a duty cycle of 

1 %. 

13-12 Consider the DS spread spectrum signal 

c(t)= S c„p(t-nT t ) 

ffS: • N 

where c„ is a periodic m sequence with a period N = 127 and p(t) is a rectangular 
pulse of duration 7) = 1 ps. Determine the power spectral density of the signal 
c(r). 

13-13 Suppose that {ci,} and {c 2 ,} are two binary (0, 1) periodic sequences with periods /V, 
and N 2 , respectively. Determine the period of the sequence obtained by forming 
the modulo-2 sum of {c,,} and {c 2i }- 

13-14 An m = 10 ML shift register is used to generate the pseudorandom sequence in a 
DS spread spectrum system. The chip duration is T c ~ 1 ps, and the bit duration is 
T h = NT r , where N is the length (period) of the m sequence, 
a Determine the processing gain of the system in dB. 

b Determine the jamming margin if the required and the jammer is a 

tone jammer with an average power 7 1V . 

13-15 A FH binary orthogonal FSK system employs an m = 15 stage linear feedback 
shift register that generates an ML sequence. Each state of the shift register selects 
one of L non overlapping frequency bands in the hopping pattern. The bit rate is 
100 bits/s and the hop rate is once per bit. The demodulator employs noncoherent 
detection. 

a Determine the hopping bandwidth for this channel, 
b What is the processing gain? 

c What is the probability of error in the presence of A WON? 
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13-16 Consider the FH binary orthogonal FSK system described in Problem 13-15. 
Suppose that the hop rate is increased to 2 hops/bit. The receiver uses square-law 
combining to combine the signal over the two hops, 
a Determine the hopping bandwidth for the channel, 
b What is the processing gain? 

c What is the error probability in the presence of AWGN? 

13-17 In a fast FH spread-spectrum system, the information is transmitted via FSK, with 
noncoherent detection. Suppose there are N = 3 hops/bit, with hard-decision 
decoding of the signal in each hop. 

a Determine the probability of error for this system in an AWGN channel with 
power spectral density and an SNR = 13dB (total SNR over the three 

hops). 

b Compare the result in (a) with the error probability of a FH spread-spectrum 
system that hops once per bit. 

13-18 A slow FH binary FSK system with noncoherent detection operates at %JJ„ = 10. 
with a hopping bandwidth of 2 GHz, and a bit rate of 10 kbits/s. 
a What is the processing gain for the system? 

b If the jammer operates as a partial-band jammer, what is the bandwidth 
occupancy for worst-case jamming? 

c What is the probability of error for the worst-case partial-band jammer? 

13-19 Determine the error probability for a FH spread spectrum signal in which a binar\ 
convolutional code is used in combination with binary FSK. The interference on 
the channel is^AWON. The FSK demodulator outputs are square-law detected and 
passed to the decoder, which performs optimum soft-decision Viterbi decoding as 
described in Section 8-2. Assume that the hopping rate is 1 hop per coded bit. 
13-20 Repeat Problem 13-19 for hard-decision Viterbi decoding. 

13-21 Repeat Problem 13-19 when fast frequency hopping is performed at a hopping rate 
of L hops per coded bit. 

13-22 Repeat Problem 13-19 when fast frequency hopping is performed with L hops per 
coded bit and the decoder is a hard-decision Viterbi decoder. The L chips per 
coded bit are square-law-detected and combined prior to the hard decision. 

13-23 The TATS signal described in Section 13-3-3 is demodulated by a parallel bank of 
eight matched filters (octal FSK), and each filter output is square-law-detected. 
The eight outputs obtained in each of seven signal intervals (56 total outputs) are 
used to form the 64 possible decision variables corresponding to the Reed- 
Solomon (7,2) code. Determine an upper (union) bound of the code word error 
probability for AWGN and soft-decision decoding. 

13-24 Repeat Problem 13-23 for the worst-case partial-band interference channel. 

13-25 Derive the results in (13-2-62) and (13-2-63} from (13-2-61). 

13-26 Show that (13-3-14) follows from (13-3-13). 

13-27 Derive (13-3-17) from (13-3-16), 

13-28 The generator polynomials for constructing Gold code sequences of length n = 7 
are 


Si (P)=P*+P + 1 
g2(P) = P 3 +P Z + 1 


Generate all the Gold codes of length 7 and determine the cross-correlations ot 
one sequence with each of the others. 
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FIGURE P 13-29 



13-29 In Section 13-2-3, we demonstrated techniques for evaluating the error probability 
of a coded system with interleaving in pulse interference by using the cutoff rate 
parameter R 0 . Use the error probability curves given in Fig. PI 3-29 for rate 1/2 
and 1/3 convolutional codes with soft-decision Viterbi decoding to determine the 
corresponding error rates for a coded system in pulse interference. Perform this 
computation for K = 3, 5, and 7. 

13-30 In coded and interleaved DS binary PSK modulation with pulse jamming and 
soft-decision decoding, the cutoff rate is 

1 -log 2 (l +ae 

where a is the fraction of the time the system is being jammed, % = %„R, R is the 
bit rate, and iV 0 = / 0 . 

a Show that the SNR per bit, % h lN a , can be expressed as 

. « 

N a aR 2 l -* u - 1 

b Determine the value of a that maximizes the required %/N„ (worst-case pulse 
jamming) and the resulting maximum value of % b /N 0 . 
b Plot the graph of 10 log ( ! i h /rN P ) versus R ( „ where r = RJR, for worst-case 
pulse jamming and for AWGN (a = 1). What conclusions do you reach 
regarding the effect of worst-case pulse jamming? 
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FIGURE PI3-29 



( Continued ). 


13-31 In a coded and interleaved frequency-hopped q- ary FSK modulation with partial 
band jamming and coherent demodulation with soft-decision decoding, the cutoff 
rate is 

R ° = l0g2 [l + fa - 1 

where a is the fraction of the band being jammed, is the chip (or tone) energy, 
and N 0 = J„. 

a Show that the SNR per bit can be expressed as 

% _ 2 (g - 1 )« 

No aR 

b Determine the value of a that maximizes the required % b /N 0 (worst-case partial 
band jamming) and the resulting maximum value of %„/N 0 . 
c Define r - R 0 /R in the result for 'tJN n from (b), and plot 10 log ( %„/rN 0 ) versus 
the normalized cutoff rate Ro/lofrq for <7=2, 4, 8, 16, 32. Compare these 
graphs with the results of Problem 13-30<c). What conclusions do you reach 
regarding the effect of worst-case partial band jamming? What is the effect of 
increasing the alphabet size q ? What is the penalty in SNR between the results 
in Problem 13-30(c) and q- ary FSK as q~* ■»? 



14 


DIGITAL 

COMMUNICATION 
THROUGH FADING 
MULTIPATH CHANNELS 


The previous chapters have described the design and performance of digital 
communications systems for transmission on either the classical AWGN 
channel or a linear filter channel with AWGN, We observed that the distortion 
inherent in linear filter channels requires special signal design techniques and 
rather sophisticated adaptive equalization algorithms in order to achieve good 
performance. 

In this chapter, we consider the signal design, receiver structure, and 
receiver performance for more complex channels, namely, channels having 
randomly time-variant impulse responses. This characterization serves as a 
model for signal transmission over many radio channels such as shortwave 
ionospheric radio communication in the 3-30 MHz frequency band (HF), 
tropospheric scatter (beyond-the-horizon) radio communications in the 300- 
3000 MHz frequency band (UHF) and 3000-30 000 MHz frequency band 
(SHF), and ionospheric forward scatter in the 30-300 MHz frequency band 
(VHF). The time-variant impulse responses of these channels are a conse- 
quence of the constantly changing physical characteristics of the media. For 
example, the ions in the ionospheric layers that reflect the signals transmitted 
in the HF frequency band are always in motion. To the user of the channel, the 
motion of the ions appears to be random. Consequently, if the same signal is 
transmitted at HF in two widely separated time intervals, the two received 
signals will be different. The time-varying responses that occur are treated in 
statistical terms. 

We shall begin our treatment of digital signalling over fading multipath 
channels by first developing a statistical characterization of the channel. Then 
we shall evaluate the performance of several basic digital signaling techniques 
for communication over such channels. The performance results will demons- 
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trate the severe penalty in SNR that must be paid as a consequence of the 
fading characteristics of the received signal. We shall then show that the 
penalty in SNR can be dramatically reduced by means of efficient 
modulation/coding and demodulation/decoding techniques. 


14-1 CHARACTERIZATION OF FADING MULTIPATH 
CHANNELS 

If we transmit an extremely short pulse, ideally an impulse, over a time-varying 
multipath channel, the received signal might appear as a train of pulses, as 
shown in Fig. 14-1-1. Hence, one characteristic of a multipath medium is the 
time spread introduced in the signal that is transmitted through the channel. 

A second characteristic is due to the time variations in the structure of the 
medium. As a result of such time variations, the nature of the multipath varies 
with time. That is, if we repeat the pulse-sounding experiment over and over, 
we shall observe changes in the received pulse train, which will include changes 
in the sizes of the individual pulses, changes in the relative delays among the 
pulses, and, quite often, changes in the number of pulses observed in the 
received pulse train as shown in Fig. 14-1-1. Moreover, the time variations 
appear to be unpredictable to the user of the channel. Therefore, it is 
reasonable to characterize the time-variant multipath channel statistically. 


Transmitted signal Received signal 



FIGURE 14-1-1 Example of the response of a time-variant 
multipath channel lo a very narrow pulse. 


/ * f„+y 
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Toward this end, let us examine the effects of the channel on a transmitted 
signal that is represented in general as 

s(t) = Re [s / (/)e /2 ' r/ ‘ 7 ] (14-1-1) 

We assume that there are multiple propagation paths. Associated with each 
path is a propagation delay and an attenuation factor. Both the propagation 
delays and the attenuation factors are time-variant as a result of changes in the 
structure of the medium. Thus, the received bandpass signal may be expressed 
in the form 

AO = 2 M'M' - t*(0) (14-1-2) 

rt 

where «„(/) is the attenuation factor for the signal received on the nth path 
and T„(r) is the propagation delay for the nth path. Substitution for s(r) from 
(14-1-1) into (14-1-2) yields the result 

jr(f) = Re j j £ - r„(r)) (14-1-3) 

It is apparent from (14-1-3) that the equivalent lowpass received signal is 

A 0 = 2 a »(0 e j2Kf ' u,) s,(t - r„(0) (14-1-4) 

n 

Since r,(t) is the response of an equivalent lowpass channel to the equivalent 
lowpass signal s,(r), it follows that the equivalent lowpass channel is described 
by the time-variant impulse response 

c(t;0 = 2“«( ( ) f ' Ml ' l ' ls (r-'t,(0) (14-1-5) 

n 

For some channels, such as the tropospheric scatter channel, it is more 
appropriate to view the received signal as consisting of a continuum of 
multipath components. In such a case, the received signal *(/) is expressed in 
the integral form 

A0~ a(z\t)s(t - z)dz (14-1-6) 

where a(r;r) denotes the attenuation of the signal components at delay rand 
at time instant t. Now substitution for s{t ) from (14-1-1) into (14-1-6) yields 

x(t) = Re j[ a(r; t )e~' 2xf < r s,(t - r) drje^J (14-1-7) 

Since the integral in (14-1-7) represents the convolution of s,(t) with an 
equivalent lowpass time-variant impulse response c(t; r), it follows that 


c(r; () - a(x;t)e i2 * f ‘ r 


(14-1-8) 
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where c(r. /) represents the response of the channel at time t due to an impulse 
applied at time t - r. Thus (14-1-8) is the appropriate definition of the 
equivalent lowpass impulse response when the channel results in continuous 
multipath and (14-1-5) is appropriate for a channel that contains discrete 
multipath components. 

Now let us consider the transmission of an unmodulated carrier at frequency 
f . Then s,(r) = 1 for all t, and, hence, the received signal for the case of 
discrete multipath, given by ( 14-1-4), reduces to 

ri (t) = 2 cc„it)e- i2nfx " U) 


= 2 ct n {t)e^ B " l ° ( 14 - 1 - 9 ) 


where Q n (t) = 2xf Thus, the received signal consists of the sum of a 
number of time-variant vectors (phasors) having amplitudes a„(f) and phases 
0„(r). Note that large dynamic changes in the medium are required for «„(r) to 
change sufficiently to cause a significant change in the received signal. On the 
other hand, 6„{t) will change by 2 n rad whenever r„ changes by 1 If .. But l/f is 
a small number and, hence, 9„ can change by 2;rrad with relatively small 
motions of the medium. We also expect the delays r„(t) associated with the 
different signal paths to change at different rates and in an unpredictable 
(random) manner. This implies that the received signal r,(t ) in (14-1-9) can be 
modeled as a random process. When there are a large number of paths, the 
central limit theorem can be applied. That is, r/(f) may be modeled as a 
complex-valued gaussian random process. This means that the time-variant 
impulse response c(r;t) is a complex-valued gaussian random process in the t 
variable. 

The multipath propagation model for the channel embodied in the received 
signal /•,(/), given in (14-1-9), results in signal fading. The fading phenomenon 
is primarily a result of the time variations in the phases {#„(/)}. That is, the 
randomly time-variant phases {0„(r)} associated with the vectors {a n e _ '®"} at 
times result in the vectors adding destructively. When that occurs, the resultant 
received signal r,(r) is very small or practically zero. At other times, the vectors 
\a„e /0 ''} add constructively, so that the received signal is large. Thus, the 
amplitude variations in the received signal, termed signal fading, are due to the 
time-variant multipath characteristics of the channel. 

When the impulse response c(r;/) is modeled as a zero-mean complex 
valued gaussian process, the envelope |c(t;/)I at any instant t is Rayleigh- 
distributed. In this case the channel is said to be a Rayleigh fading channel. In 
the event that there are fixed scatterers or signal reflectors in the medium, in 
addition to randomly moving scatterers, e(r;f) can no longer be modeled as 
having zero mean. In this case, the envelope |c(t;e)I has a Rice distribution 
and the channel is said to be a Ricean fading channel. Another probability 
distribution function that has been used to model the envelope of fading 
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signals is the Nakagami-m distribution. These fading channel models are 
considered in Section 14-1-2. 


14-1-1 Channel Correlation Functions and Power Spectra 

We shall now develop a number of useful correlation functions and power 
spectral density functions that define the characteristics of a fading multipath 
channel. Our starting point is the equivalent lowpass impulse response c(r,t), 
which is characterized as a complex-valued random process in the t variable. 
We assume that c(r;t) is wide-sense-stationary. Then we define the autocor- 
relation function of c(r;r) as 

t 2 ; At) = j£[c*(Ti; r)c(r 2 ; r + Ar)] (14-1-10) 

In most radio transmission media, the attenuation and phase shift of the 
channel associated with path delay r t is uncorrelated with the attenuation and 
phase shift associated with path delay r 2 . This is usually called uncorrelated 
scattering. We make the assumption that the scattering at two different delays 
is uncorrelated and incorporate it into (14-1-10) to obtain 

|£[c*(r,;/)c(r 2 ;r + A/)] = (r,: Ar)S(r, - r 2 ) (14-1-11) 

If we let Ar=0, the resulting autocorrelation function <f> c (r; 0) = d> r (r) is 
simply the average power output of the channel as a function of the time delay 
r. For this reason, <fv(r) is called the multipath intensity profile or the delay 
power spectrum of the channel. In general, At) gives the average power 
output as a function of the time delay r and the difference Ar in observation 
time. 

In practice, the function <6,{T;Ar) is measured by transmitting very narrow 
pulses or, equivalently, a wideband signal and cross-correlating the received 
signal with a delayed version of itself. Typically, the measured function <f> c ( r) 
may appear as shown in Fig. 14-1-2. The range of values of t over which d> r (r) 


♦,u> 



FIGURE 14-1-2 Multipath intensity profile. 


t 
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is essentially nonzero is called the multipath spread of the channel and is 
denoted by T m . 

A completely analogous characterization of the time-variant multipath 
channel begins in the frequency domain. By taking the Fourier transform of 
c(r;r) we obtain the time-variant transfer function C(f;t), where / is the 
frequency variable. Thus, 


C(/; t) = f c(r; t)e~’ 2 * fr dx 

J — ct 


(14-1-12) 


If c(r;f) is modeled as a complex-valued zero-mean gaussian random process 
in the t variable, it follows that C(/;f) also has the same statistics. Under the 
assumption that the channel is wide-sense-stationary, we define the autocor- 
relation function 

Af) = 2E[C*(f\ t)C(f 2 : t + At)] (14-1-13) 

Since C(f;t) is the Fourier transform of c(r;t), it is not surprising to find 
that f 2 . At) is related to 4> c ( T*,Af) by the Fourier transform. The 

relationship is easily established by substituting (14-1-12) into (14-1-13). Thus. 


<M/i . k 


n oo 

r)c(r 2 ; t + Ar)]e' 2 ' r(/ir ' dx t dr 2 

- CO 



J — ao J -oo 


Af)5(r, 


X2)e ** ( f^-fw dh dTl 


= f° 4 > c (t,; dx, 

J — oo 

= f° <MTi; At)e y2jrA/t| dx I = <MA/: At) 

J — oo 


( 14 - 1 - 14 ) 


where A f = f 2 -f. From (14-1-14), we observe that d> c (Af\At) is the Fourier 
transform of the multipath intensity profile. Furthermore, the assumption of 
uncorrelated scattering implies that the autocorrelation function of C(f: t) in 
frequency is a function of only the frequency difference A f = f 2 -f. Therefore, 
it is appropriate to call <f> c (Af:At) the spaced -frequency, spaced -time correla- 
tion function of the channel. It can be measured in practice by transmitting a 
pair of sinusoids separated by A f and cross-correlating the two separately 
received signals with a relative delay A t. 

Suppose we set At = 0 in (14-1-14). Then, with <f> c (Af: 0) = <MA/) and 
<Mt; 0) = <f> c (x), the transform relationship is simply 

<MA/)=f <Mr)e " >2 ' vr rfr 


( 14 - 1 - 15 ) 
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FIGURE 14-1-3 





Spaced -frequency 
correlation function 



Multipath intensity profile 


Relationship between and <£,.( r). 


The relationship is depicted graphically in Fig. 14-1-3. Since <t> c (Af) is an 
autocorrelation function in the frequency variable, it provides us with a 
measure of the frequency coherence of the channel. As a result of the Fourier 
transform relationship between <t> c (Af) and <k(r), the reciprocal of the 
multipath spread is a measure of the coherence bandwidth of the channel. That 
is, 

(AA - “ (14-1-16) 

* m 

where ( Af) c denotes the coherence bandwidth. Thus, two sinusoids with 
frequency separation greater than (A/) c are affected differently by the channel. 
When an information-bearing signal is transmitted through the channel, if 
(Af) c is small in comparison to the bandwidth of the transmitted signal, the 
channel is said to be frequency -selective. In this case, the signal is severely 
distorted by the channel. On the other hand, if (A f) ( is large in comparison 
with the bandwidth of the transmitted signal, the channel is said to be 
frequency - nonselective . 

We now focus our attention on the time variations of the channel as 
measured by the parameter At in 4> c (Af; A/). The time variations in the 
channel are evidenced as a Doppler broadening and, perhaps, in addition as a 
Doppler shift of a spectral line. In order to relate the Doppler effects to the 
time variations of the channel, we define the Fourier transform of <t> c (Af : At) 
with respect to the variable At to be the function 5 c (Af;A). That is, 


S c (Af;A) = 



4> c (Af;At)e- j2xA ^dAt 


(14-1-17) 


With A/ set to zero and 5 C (0; A) = S C (A), the relation in (14-1-17) becomes 

S C (A) = J <t> c (At)e~ i2 * x *dAt (14-1-18) 
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FIGURE 14-1-4 



Spaccd-time correlaiion function Doppler power spectrum 

Relationship between 4> c (\t) and 5 C (A). 


The function S C (A) is a power spectrum that gives the signal intensify as a 
function of the Doppler frequency A. Hence, we call 5 C (A) the Doppler power 
spectrum of the channel. 

From (14-1-18), we observe that if the channel is time-invariant, <£ c (Af) - 1 
and S r (A) becomes equal to the delta function 5(A). Therefore, when there are 
no time variations in the channel, there is no spectral broadening observed in 
the transmission of a pure frequency tone. 

The range of values of A over which £ C (A) is essentially nonzero is called the 
Doppler spread B d of the channel. Since £ C (A) is related to <t> c (A0 by the 
Fourier transform, the reciprocal of B, t is a measure of the coherence time of 
the channel. That is. 


(A0< * IT (14-1-19) 

where (A t) c denotes the coherence time. Clearly, a slowly changing channel has 
a large coherence time or, equivalently, a small Doppler spread. Figure 14-1-4 
illustrates the relationship between <j» c (At) and 5 C (A). 

We have now established a Fourier transform relationship between 
<MA/;At) and <t> c {r,At) involving the variables (r, A/), and a Fourier 
transform relationship between <MA/; AO and S c {\f \ A) involving the vari- 
ables (At, A). There are two additional Fourier transform relationships that we 
can define, which serve to relate <k(r;A t) to S C (A/ ; A) and, thus, close the 
loop. The desired relationship is obtained by defining a new function, denoted 
by 5 (t; A), to be the Fourier transform of <f> c (r\ At) in the At variable. That is, 

•S(t;A)=[ 4> c (r;At)e~' 2 * A *' dAi (14-1-20) 

J _co 

It follows that £(t; A) and £ C (A/; A) are a Fourier transform pair. That is, 

S(r; A) = [ S c (Af ; A)e J2xT ^dAf 

J — tx, 


(14-1-21) 



766 DIGITAL COMMUNICATIONS 


FIGURE 14-1-5 


Furthermore, 5(r; A) and <£< (Af : At) are related by the double Fourier 
transform 

5(r; A) = [ f <f> c (Af\At) e -’ 2 ***'e> 2 * T VdAtdAf (14-1-22) 

l4iis new function 5(r, A) is called the scattering function of the channel. It 
provides us with a measure of the average power output of the channel as a 
function of the time delay r and the Doppler frequency A. 

The relationships among the four functions <t> c (Af\ At), <t> c {r\At), 
<j>c(Af: A), and 5 (t; A) are summarized in Fig. 14-1-5. 


Relationships among the channel correlation functions and power spectra. [From Green (1962). 
with permission ] 
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FIGURE 14-1-6 Scattering function of a medium-range tropospheric scatter channel. The taps delay increment is 

0.1 ^s. 


The scattering function S(r;A) measured on a 150 mi tropospheric scatter 
link is shown in Fig. 14-1-6. The signal used to probe the channel had a time 
resolution of 0.1 ju.s. Hence, the time-delay axis is quantized in increments of 
0.1 ps. From the graph, we observe that the multipath spread T m = 0.7 p s. On 
the other hand, the Doppler spread, which may be defined as the 3 dB 
bandwidth of the power spectrum for each signal path, appears to vary with 
each signal path. For example, in one path it is less than 1 Hz, while in some 
other paths it is several hertz. For our purposes, we shall take the largest of 
these 3 dB bandwidths of the various paths and call that the Doppler spread. 


14-1-2 Statistical Models for Fading Channels 

There are several probability distributions that can be considered in attempting 
to model the statistical characteristics of the fading channel. When there are a 
large number of scatterers in the channel that contribute to the signal at the 
receiver, as is the case in ionospheric or tropospheric signal propagation, 
application of the central limit theorem leads to a gaussian process model for 
the channel impulse response. If the process is zero-mean, then the envelope of 
the channel response at any time instant has a Rayleigh probability distribution 
and the phase is uniformly distributed in the interval (0, 2;r). That is, 

pA r )^e' 2/a , 


r s* 0 


(14-1-23) 
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where 

ii=E(R 2 ) (14-1-24) 

We observe that the Rayleigh distribution is characterized by the single 
parameter E(R 2 ). 

An alternative statistical model for the envelope of the channel response is 
the Nakagami-m distribution given by the pdf in (2-1-147). In constrast to the 
Rayleigh distribution, which has a single parameter that can be used to match 
the fading channel statistics, the Nakagami-m is a two-parameter distribution, 
namely, involving the parameter m and the second moment 12 = E(R 2 ). As a 
consequence, this distribution provides more flexibility and accuracy in 
matching the observed signal statistics. The Nakagami-m distribution can be 
used to model fading channel conditions that are either more or less severe 
than the Rayleigh distribution, and it includes the Rayleigh distribution as a 
special case (m = 1). For example, Turin (1972) and Suzuki (1977) have shown 
that the Nakagami-m distribution is the best fit for data signals received in 
urban radio multipath channels. 

The Rice distribution is also a two-parameter distribution. It may be 
expressed by the pdf given in (2-1-141), where the parameters are s and a 2 . 
Recall that s^is called the noncentrality parameter in the equivalent chi-square 
distribution. It represents the power in the nonfading signal components, 
sometimes called specular components, of the received signal. 

There are many radio channels in which fading is encountered that are 
basically line-of-sight (LOS) communication links with multipath components 
arising from secondary reflections, or signal paths, from surrounding terrain. In 
such channels, the number of multipath components is small, and, hence, the 
channel may be modeled in a somewhat simpler form. We cite two channel 
models as examples. 

As the first example, let us consider an airplane to ground communication 
link in which there is the direct path and a single multipath component at a 
delay t 0 relative to the direct path. The impulse response of such a channel may 
be modeled as 


c(r, t) = ad( t) + p(r)&(T - Za(t )) (14-1-25) 

where a is the attenuation factor of the direct path and /3(f) represents .the 
time-variant multipath signal component resulting from terrain reflections. 
Often, /3(f) can be characterized as a zero-mean gaussian random process. The 
transfer function for this channel model may be expressed as 

C(/;t) = a + p{t) e - pM,) (14-1-26) 

This channel fits the Ricean fading model defined previously. The direct path 
with attenuation a represents the specular component and /3(f) represents the 
Rayleigh fading component. 

A similar model has been found to hold for microwave LOS radio channels 
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FIGURE 14-1-7 


used for long-distance voice and video transmission by telephone companies 
throughout the world. For such channels, Rummler (1979) has developed a 
three-path model based on channel measurements performed on typical LOS 
links in the 6 GHz frequency band. The differential delay on the two multipath 
components is relatively small, and, hence, the model developed by Rummler 
is one that has a channel transfer function 

C(/) = a[l - pe-Mf-M*] (14-1-27) 

where a is the overall attenuation parameter, p is called a shape parameter 
which is due to the multipath components, f 0 is the frequency of the fade 
minimum, and t q is the relative time delay between the direct and the 
multipath components. This simplified model was used to fit data derived from 
channel measurements. 

Rummler found that the parameters a and 0 may be characterized as 
random variables that, for practical purposes, are nearly statistically indepen- 
dent. From the channel measurements, he found that the distribution of /3 has 
the form (1 - /3) 2 \ The distribution of a is well modeled by the lognormal 
distribution, i.e., -log a is gaussian. For p >0.5, the mean of -20 log a was 
found to be 25 dB and the standard deviation was 5 dB. For smaller values of 
P, the mean decreases to 15 dB. The delay parameter determined from the 
measurements was r„ = 6.3ns. The magnitude-square response of C(f) is 

\C(J)\ 2 = a 2 [\ +p 2 - 2p cos 2n(f-f 0 )x 0 ] (14-1-28) 

\C{f)\ is plotted in Fig. 14-1-7 as a function of the frequency f~f 0 for 
r n = 6.3 ns. Note that the effect of the multipath component is to create a deep 
attenuation at f~fo and at multiples of 1/To ®= 159 MHz. By comparison, the 
typical channel bandwidth is 30 MHz. This model was used by Lundgren and 
Rummler (1979) to determine the error rate performance of digital radio 
systems. 


Magnitude frequency response of LOS channel model 
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14-2 THE EFFECT OF SIGNAL CHARACTERISTICS 
ON THE CHOICE OF A CHANNEL IVfODEL 

Having discussed the statistical characterization of time-variant multipath 
channels generally in terms of the correlation functions described in Section 
14-1, we now consider the effect of signal characteristics on the selection of a 
channel model that is appropriate for the specified signal. Thus, let s,(r) be the 
equivalent lowpass signal transmitted over the channel and let S,(f) denote its 
frequency content. Then the equivalent lowpass received signal, exclusive of 
additive noise, may be expressed either in terms of the time domain variables 
c(r; t) and $,(/) as 

r t {0={ c(r,t)s,(t- x)dt (14-2-1) 

J - GO 

or in terms of the frequency functions C(/; r) and $,(/ ) as 

n(t) = f C(/; f)S,(/y 2 ** df (14-2-2) 

J -■* 

Suppose we are transmitting digital information over the channel by 
modulating (either in amplitude, or in phase, or both) the basic pulse s,(f) at a 
rate \!T, where T is the signaling interval. It is apparent from (14-2-2) that the 
time-variant channel characterized by the transfer function C(/;t) distorts the 
signal If £,(/) has a bandwidth W greater than the coherence bandwidth 
(A/) c of the channel, S,(f ) is subjected to different gains and phase shifts across 
the band. In such a case, the channel is said to be frequency -selective. 
Additional distortion is caused by the time variations in C{f ; f). This type of 
distortion is evidenced as a variation in the received signal strength, and has 
been termed fading. It should be emphasized that the frequency selectivity and 
fading are viewed as two different types of distortion. The former depends on 
the multipath spread or, equivalently, on the coherence bandwidth of the 
channel relative to the transmitted signal bandwidth W. The latter depends on 
the time variations of the channel, which are grossly characterized by the 
coherence time (A/) f or, equivalently, by the Doppler spread B d . 

The effect of the channel on the transmitted signal s,(t) is a function of our 
choice of signal bandwidth and signal duration. For example, if we select the 
signaling interval T to satisfy the condition T»T m , the channel introduces a 
negligible amount of intersymbol interference. If the bandwidth of the signal 
pulse s,(t) is W ~ I IT, the condition T » T m implies that 

W « ~r ~ (A/) c (14-2-3) 

That is, the signal bandwidth W is much smaller than the coherence bandwidth 
of the channel. Hence, the channel is frequency-nonselective. In other words. 
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all of the frequency components in S,(f) undergo the same attenuation and 
phase shift in transmission through the channel. But this implies that, within 
the bandwidth occupied by S/(/), the time-variant transfer function C(fu) of 
the channel is a complex-valued constant in the frequency variable. Since S/(f) 
has its frequency content concentrated in the vicinity of / = 0, C{f: t ) = C(0; t). 
Consequently, (14-2-2) reduces to 


r,(r) = C(0:r)f S,(f)e**df 

J • TO 

= C(0; t)s/(t) (14-2-4) 

Thus, when the signal bandwidth W is much smaller than the coherence 
bandwidth (A/) r of the channel, the received signal is simply the transmitted 
signal multiplied by a complex-valued random process C(0;f), which rep- 
resents the time-variant characteristics of the channel. In this case, we say that 
the multipath components in the received are not resolvable because W « 

m. 

The transfer function C(0; /) for a frequency-nonselective channel may be 
expressed in the form 

C(0; r) = a(t)e~ m ‘ ) (14-2-5) 

where a(t) represents the envelope and represents the phase of the 
equivalent lowpass channel. When C(0; t) is modeled as a zero-mean complex- 
valued gaussian random process, the envelope a(t ) is Rayleigh-distributed for 
any fixed value of / and <f>{t) is uniformly distributed over the interval (~7t, rt). 
The rapidity of the fading on the frequency-nonselective channel is determined 
either from the correlation function <f» c (A/) or from the Doppler power 
spectrum S C (A). Alternatively, either of the channel parameters (Af), or S ;/ can 
be used to characterize the rapidity of the fading. 

For example, suppose it is possible to select the signal bandwidth IT to 
satisfy the condition W « (A f) c and the signaling interval T to satisfy the 
condition T « (At),.. Since T is smaller than the coherence time of the channel, 
the channel attenuation and phase shift are essentially fixed for the duration of 
at least one signaling interval. When this condition holds, we call the channel a 
slowly fading channel. Furthermore, when W = \/T, the conditions that the 
channel be frequency-nonselective and slowly fading imply that the product of 
T„, and B d must satisfy the condition T„,B d < 1. 

The product T„,B d is called the spread factor of the channel. If T m B d < 1, the 
channel is said to be underspread\ otherwise, it is overspread. The multipath 
spread, the Doppler spread, and the spread factor are listed in Table 14-2-1 for 
several channels. We observe from this table that several radio channels, 
including the moon when used as a passive reflector, are underspread. 
Consequently, it is possible to select the signal s,(r) such that these channels 
are frequency-nonselective and slowly fading. The slow-fading condition 



772 DIGITAL COMMUNICATIONS 


TABLE 14-2-1 MULTIPATH SPREAD. DOPPLER SPREAD. AND SPREAD FACTOR 
EOR SEVERAL TIME- VARIANT MULTIPATH CHANNELS 


Type of channel 

Multipath 

duration 

Doppler 

spread 

Spread 

factor 

Shortwave ionospheric 
propagation (HF) 

JO ’-10' 3 

10 '-1 

10 4 -10 2 

Ionospheric propagation 
under disturbed auroral 
conditions (HF) 

10~M0' 2 

10-100 

I0' 2 -I 

Ionospheric forward scatter 
(VHP) 

JO 4 

10 

10 ^ 

Tropospheric scatter (SHF) 

10 6 

10 

ir 5 

Orbital scatter (X band) 

10 4 


10 ' 

Moon at max. libration 
0f> = 0.4 kmc) 

10' 2 

10 

10' 1 


implies that the channel characteristics vary sufficiently slowly that they can be 
measured. 

In Section 14-3, we shall determine the error rate performance for binary 
signaling over a frequency-nonseiective slowly fading channel. This channel 
model is, by far, the simplest to analyze. More importantly, it yields insight 
into the performance characteristics for digital signaling on a fading channel 
and serves to suggest the type of signal waveforms that are effective in 
overcoming the fading caused by the channel. 

Since the multipath components in the received signal are not resolvable 
when the signal bandwidth W is less than the coherence bandwidth (A/) c of the 
channel, the received signal appears to arrive at the receiver via a single fading 
path. On the other hand, we may choose W»(A f) c , so that the channel 
becomes frequency-selective. We shall show later that, under this condition, 
the multipath components in the received signal are resolvable with a 
resolution in time delay of 1/W. Thus, we shall illustrate that the frequency- 
selective channel can be modeled as a tapped delay line (transversal) filter with 
time-variant tap coefficients. We shall then derive the performance of binary 
signaling over such a frequency-selective channel model. 


14-3 FREQUENCY -NONSELECTIVE, SLOWLY 
FADING CHANNEL 

In this section, we derive the error rate performance of binary PSK and binary 
FSK when these signals are transmitted over a frequency-nonseiective, slowly 
fading channel. As described in Section 14-2, the frequency-nonseiective 
channel results in multiplicative distortion of the transmitted signal s,(t). 
Furthermore, the condition that the channel fades slowly implies that the 
multiplicative process may be regarded as a constant during at least one 
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signaling interval. Consequently, if the transmitted signal is the received 
equivalent lowpass signal in one signaling interval is 

r,(t)=ae~>+s l (t) + z(t), (14-3-1) 

where z(r) represents the complex-valued white gaussian noise process 
corrupting the signal. 

Let us assume that the channel fading is sufficiently slow that the phase shift 
<t> can be estimated from the received signal without error. In that case, we can 
achieve ideal coherent detection of the received signal. Thus, the received 
signal can be processed by passing it through a matched filter in the case of 
binary PSK or through a pair of matched filters in the case of binary FSK. One 
method that we can use to determine the performance of the binary 
communications systems is to evaluate the decision variables and from these 
determine the probability of error. However, we have already done this for a 
fixed (time-invariant) channel. That is, for a fixed attenuation a, we have 
previously derived the probability of error for binary PSK and binary FSK. 
From (5-2-5), the expression for the error rate of binary PSK as a function of 
the received SNR y b is 

p2{y b )^Q{^2y b ) (14-3-2) 

where y b - a 2 %!N (} . The expression for the error rate of binary FSK, detected 
coherently, is given by (5-2-10) as 

PiiYb) = G(V%) (14-3-3) 

We view (14-3-2) and (14-3-3) as conditional error probabilities, where the 

condition is that a is fixed. To obtain the error probabilities when a is random, 
we must average P 2 {y b ), given in (14-3-2) and (14-3-3), over the probability 
density function of y b . That is, we must evaluate the integral 

p 2~( p i(Yb)p(y b ) 3y b (14-3-4) 

Jo 

where p( y b ) is the probability density function of y b when a is random. 

Rayleigh Fading Since a is Rayleigh-distributed, a 2 has a chi-square 
probability distribution with two degrees of freedom. Consequently, y b also is 
chi-square-distributed. It is easily shown that 

P( y„^0 (14-3-5) 


where y b is the average signal-to-noise ratio, defined as 

Y 6=^£(a 2 ) (14-3-6) 


The term E(a 2 ) is simply the average value of a 2 . 
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Now we can substitute (14-3-5) into (14-3-4) and carry out the integration 
for P 2 (y h ) as given by (14-3-2) and (14-3-3). The result of this integration for 
binary PSK is 



(14-3-7) 


If we repeat the integration with P 2 (y b ) given by (14-3-3), we obtain the 
probability of error for binary FSK, detected coherently, in the form 



(14-3-8) 


In arriving at the error rate results in (14-3-7) and (14-3-8), we have 
assumed that the estimate of the channel phase shift, obtained in the presence 
of slow fading, is noiseless. Such an ideal condition may not hold in practice. In 
such a case, the expressions in (14-3-7) and (14-3-8) should be viewed as 
representing the best achievable performance in the presence of Rayleigh 
fading. In Appendix C we consider the problem of estimating the phase in the 
presence of noise and we evaluate the error rate performance of binary and 
multiphase PSK. 

On channels for which the fading is sufficiently rapid to preclude the 
estimation of a stable phase reference by averaging the received signal phase 
over many signaling intervals, DPSK, is an alternative signaling method. Since 
DPSK requires phase stability over only two consecutive signaling intervals, 
this modulation technique is quite robust in the presence of signal fading. In 
deriving the performance of binary DPSK for a fading channel, we begin again 
with the error probability for a nonfading channel, which is 

J°2(y h )=k~^ (14-3-9) 

This expression is substituted into the integral in (14-3-4) along w-ith p(y b ) 
obtained from (14-3-5). Evaluation of the resulting integral yields the 
probability of error for binary DPSK, in the form 


2(1 + y b ) 


(14-3-10) 


If we choose not to estimate the channel phase shift at all, but instead 
employ a noncoherent (envelope or square-law) detector with binary, orthogo- 
nal FSK signals, the error probability for a nonfading channel is 




(14-3-11) 


When we average P 2 ( y b ) over the Rayleigh fading channel attenuation, the 
resulting error probability is 


(14-3-12) 



chafii r i4, uicir.M. comm; nicaiion through r adino multipath < hanm i.s 775 


FIGURE 14-3-1 


Performance of binary signaling on a 
Rayleigh fading channel. 



The error probabilities in (14-3-7), (14-3-8), (14-3-10), and (14-3-12) are 
illustrated in Fig. 14-3-1. In comparing the performance of the four binary 
signaling systems, we focus our attention on the probabilities of error for large 
SNR, he., y h »\. Under this condition, the error rates in (14-3-7), (14-3-8), 
(14-3-10), and (14-3-12) simplify to 


'l/4y„ 

for coherent PSK 

1/2 y„ 

for coherent, orthogonal FSK 

1/2 yy, 

for DPSK 

J/yh 

for noncoherent, orthogonal FSK 


From (14-3-13), we observe that coherent PSK is 3dB better than DPSK 
and 6dB better than noncoherent FSK. More striking, however, is the 
observation that the error rates decrease only inversely with SNR. In contrast, 
the decrease in error rate on a nonfading channel is exponential with SNR. 
This means that, on a fading channel, the transmitter must transmit a large 
amount of power in order to obtain a low probability of error. In many cases, a 
large amount of power is not possible, technically and/or economically. An 
alternative solution to the problem of obtaining acceptable performance on a 
fading channel is the use of redundancy, which can be obtained by means of 
diversity techniques, as discussed in Section 14-4. 
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Nakagami Fading if a is characterized statistically by the Nakagami-m 
distribution, the random variable y = a 2 ^ h lN f) has the pdf (see Problem 14-15) 

m '" 

P(y) = ¥^ r ' ' e ~" ,y ' Y (14 ' 3 ' 14) 

where y = E(a 2 )%/%. 

The average probability of error for any of the modulation methods is 
simply obtained by averaging the appropriate error probability for a nonfading 
channel over the fading signal statistics. 

As an example of the performance obtained with Nakagami-m fading 
statistics. Fig. 14-3-2 illustrates the probability of error of binary PSK with m as 
a parameter. We recall that m - 1 corresponds to Rayleigh fading. We observe 
that the performance improves as m is increased above m = l, which is 
indicative of the fact that the fading is less severe. On the other hand, when 
m < 1, the performance is worse than Rayleigh fading. 

Other Fading Signal Statistics Following the procedure described above, 
one can determine the performance of the various modulation methods for 
other types of fading signal statistics, such as the Rice distribution. 

Error probability results for Rice-distributed fading statistics can be found 
in the paper by Lindsey (1964), while for Nakagami-m fading statistics, the 
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reader may refer to the papers by Espbsito (1967), Miyagaki et al (1978), 
Charash (1979), AI-Hussaini e( al. (1985), and Beaulieu et al. (1991). 

14-4 DIVERSITY TECHNIQUES FOR FADING 
MULTIPATH CHANNELS 

Diversity techniques are based on the notion that errors occur in reception 
when the channel attenuation is large, i.e., when the channel is in a deep fade. 
If we can supply to the receiver several replicas of the same information signal 
transmitted over independently fading channels, the probability that all the 
signal components will fade simultaneously is reduced considerably. That is, if 
p is the probability that any one signal will fade below some critical value then 
p L is the probability that all L independently fading replicas of the same signal 
will fade below the critical value. There are several ways in which we can 
provide the receiver with L independently fading replicas of the same 
information-bearing signal. 

One method is to employ frequency diversity. That is, the same information- 
bearing signal is transmitted on L carriers, where the separation between 
successive carriers equals or exceeds the coherence bandwidth (A/),, of the 
channel. 

A second method for achieving L independently fading versions of the same 
information-bearing signal is to transmit the signal in L different time slots, 
where the separation between successive time slots equals or exceeds the 
coherence time (Ar) c of the channel. This method is called time diversity. 

Note that the fading channel fits the model of a bursty error channel. 
Furthermore, we may view the transmission of the same information either at 
different frequencies or in difference time slots (or both) as a simple form of 
repetition coding. The separation of the diversity transmissions in time by (At), 
or in frequency by (A/),, is basically a form of block-interleaving the bits in the 
repetition code in an attempt to break up the error bursts and, thus, to obtain 
independent errors. Later in the chapter, we shall demonstrate that, in general, 
repetition coding is wasteful of bandwidth when compared with nontrivial 
coding. 

Another commonly used method for achieving diversity employs multiple 
antennas. For example, we may employ a single transmitting antenna and 
multiple receiving antennas. The latter must be spaced sufficiently far apart 
that the multipath components in the signal have significantly different 
propagation delays at the antennas. Usually a separation of at least 10 
wavelengths is required between two antennas in order to obtain signals that 
fade independently. 

A more sophisticated method for obtaining diversity is based on the use of a 
signal having a bandwidth much greater than the coherence bandwidth (A f) c of 
the channel. Such a signal with bandwidth W will resolve the multipath 
components and, thus, provide the receiver with several independently fading 
signal paths. The time resolution is l/W. Consequently, with a multipath 
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spread of T„, s, there are T„,W resolvable signal components. Since T„, = 
1/(A/), , the number of resolvable signal components may also be expressed as 
W/(A/) t . Thus, the use of a wideband signal may be viewed as just another 
method for obtaining frequency diversity of order L ** W7(A/) f . The optimum 
receiver for processing the wideband signal will be derived in Section 14-5. It is 
called a RAKE correlator or a RAKE matched filter and was invented by Price 
and Green (1958). 

There are other diversity techniques that have received some consideration 
in practice, such as angle-of-arrival diversity and polarization diversity. 
However, these have not been as widely used as those described above. 

14-4-1 Binary Signals 

We shall now determine the error rate performance for a binary digital 
communications system with diversity. We begin by describing the mathemati- 
cal model for the communications system with diversity. First of all, we assume 
that there are L diversity channels, carrying the same information-bearing 
signal. Each channel is assumed to be frequency-nonselective and slowly fading 
with Rayleigh-distributed envelope statistics. The fading processes among the 
L diversity channels are assumed to be mutually statistically independent. The 
signal in each channel is corrupted by an additive zero-mean white gaussian 
noise process. The noise processes in the L channels are assumed to be 
mutually statistically independent, with identical autocorrelation functions. 
Thus, the equivalent low-pass received signals for the L channels can be 
expressed in the form 

r,k{t) = a k e + z k (t), k = 1,2 L, m= 1,2 (14-4-1) 

where { a k e } represent the attenuation factors and phase shifts for the L 
channels, s k „,(t) denotes the mth signal transmitted on the £th channel, and 
z*(r) denotes the additive white gaussian noise on the ifcth channel. All signals 
in the set (s* m (r)} have the same energy. 

The optimum demodulator for the signal received from the &th channel 
consists of two matched filters, one having the impulse response 

M0 = rf.(7'-0 (14-4-2) 

and Ihe other having the impulse response 

= s$ 2 (T - t) (14-4-3) 

Of course, if binary PSK is the modulation method used to transmit the 
information, then s*i(r) = ~s k2 (t). Consequently, only a single matched filter is 
required for binary PSK. Following the matched filters is a combiner that 
forms the two decision variables. The combiner that achieves the best 
performance is one in which each matched filter output is multiplied by the 
corresponding complex-valued (conjugate) channel gain a k e J ‘ t> \ The effect of 
this multiplication is to compensate for the phase shift in the channel and to 
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FIGURE 14-4-1 



l L C) 


Model of binary digital communications system with diversity. 


weight the signal by a factor that is proportional to the signal strength. Thus, a 
strong signal carries a larger weight than a weak signal. After the complex- 
valued weighting operation is performed, two sums are formed. One consists of 
the real parts of the weighted outputs from the matched filters corresponding 
to a transmitted 0. The second consists of the real part of the outputs from the 
matched filters -corresponding to a transmitted 1 . This optimum combiner is 
called a maximal ratio combiner by Brennan (1959). Of course, the realization 
of this optimum combiner is based on the assumption that the channel 
attenuations {a*} and the phase shifts {<£*} are known perfectly. That is, the 
estimates of the parameters {a*} and {<#>*} contain no noise. (The effect of noisy 
estimates on the error rate performance of multiphase PSK is considered in 
Appendix C. 

A block diagram illustrating the model for the binary digital communica- 
tions system described above is shown in Fig. 14-4-1. 

Let us first consider the performance of binary PSK with Lth-order 
diversity. The output of the maximal ratio combiner can be expressed as a 
single decision variable in the form 

£7 = Re (2?£<**+Z a***) 

X * = l * = 1 ’ 

L L 

= 2^X a * + 2 a kNkr (14-4-4) 

k = 1 * = 1 

where N kr denotes the real part of the complex-valued gaussian noise variable 

Nt=e / *f z k (t)s*(t) dt (14-4-5) 

■>0 

We follow the approach used in Section 14-3 in deriving the probability of 
error. That is, the probability of error conditioned on a fixed set of attenuation 
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factors {a*} is obtained first. Then the conditional probability of error is 
averaged over the probability density function of the {a*}. 


Rayleigh Fading 

with mean 


and variance 


For a fixed set of {a k } the decision variable f/is gaussian 


L 


E{V) = 2% 2 a\ 
1 

(14-4-6) 

L 

*1 = 2%N 0 2 ai 
1 

(14-4-7) 


For these values of the mean and variance, the probability that U is less than 
zero is simply 

(14-4-8) 

where the SNR per bit, y b , is given as 


Nq k = \ 


= 2y k 


(14-4-9) 


where y k = #or*/A/ 0 is the instantaneous SNR on the ikth channel. Now we 
must determine the probability density function p{y b ). This function is most 
easily determined via the characteristic function of y b . First of all, we note that 
for L = 1, y b = y. has a chi-square probability density function given in 
(14-3-5). The characteristic function of y x is easily shown to be 


1 

1 ~jvy c 


(14-4-10) 


where -y c is the average SNR per channel, which is assumed to be identical for 
all channels. That is. 


yc = 



(14-4-11) 


independent of k. This assumption applies for the results throughout this 
section. Since the fading on the L channels is mutually statistically indepen- 
dent, the {y*} are statistically independent, and, hence, the characteristic 
function for the sum y b is simply the result in (14-4-10) raised to the Lth 
power, i.e., 

1 

(1 ~jv %) L 


KU 1 * * * V ) = 


(14-4-12) 
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But this is the characteristic function of a chi-square-distributed random 
variable with 2 L degrees of freedom. It follows from (2-1-107) that the 
probability density function p(y b ) is 

(14_4 ' 13) 


The final step in this derivation is to average the conditional error 
probability given in (14-4-8) over the fading channel statistics. Thus, we 
evaluate the integral 


J o 


P2(yb)p(7b) dy h 


(14-4-14) 


There is a closed-form solution for (14-4-14), which can be expressed as 

L ~ x ' L - 1 + *' 

k 

where, by definition, 


P 2 =(J(l-/t)] i 1 ( L i ' + *)[£(! +/*)]* (14-4-15) 

Jt —0 N * l 


P = 


y c 


i + Yc 


(14-4-16) 


When the average SNR per channel, y c , satisfies the condition y c » 1, the term 
^(1 + p) = 1 and the term £(1 - p.) l/4y f . Furthermore, 




(14-4-17) 


Therefore, when y c is sufficiently large (greater than 10 dB), the probability of 
error in (14-4-15) can be approximated as 


P 2 = 



(14-4-18) 


We observe from (14-4-18) that the probability of error varies as l/y c raised to 
the Lth power. Thus, with diversity, the error rate decreases inversely with the 
Lth power of the SNR. 

Having obtained the performance of binary PSK with diversity, we now turn 
our attention to binary, orthogonal FSK that is detected coherently. In this 
case, the two decision variables at the output of the maximal ratio combiner 
may be expressed as 


t/.-Re (2%ial + ± a k N k x ) 

' *=i ' 

U 2 = Re (t a k N k2 ) 


(14-4-19) 


where we have assumed that signal s kl (t) was transmitted and where {A*,} and 
{N k2 } are the two sets of noise component at the output of the matched filters. 
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The probability of error is simply the probability that U 2 > U x . This computa- 
tion is similar to the one performed for PSK, except that we now have twice 
the noise power. Consequently, when the {a*} are fixed, the conditional 
probability of error is 

Piiyn) = <2( V T*) (14-4-20) 


We use (14-4-13) to average P 2 (y b ) over the fading. It is not surprising to find 
that the result given in (14-4-15) still applies, with % replaced by iy c . That is, 
(14-4-15) is the probability of error for binary, orthogonal FSK with coherent 
detection, where the parameter n is defined as 


P- = 



(14-4-21) 


Furthermore, for large values of y, , the performance P 2 can be approximated 
as 



2L - 1 
L 


) 


(14-4-22) 


In comparing (14-4-22) with (14-4-18), we observe that the 3 dB difference in 
performance between PSK and orthogonal FSK with coherent detection, which 
exists in a nonfading, nondispersive channel, is the same also in a fading 
channel. 

In the above discussion of binary PSK and FSK, detected coherently, we 
assumed that noiseless estimates of the complex-valued channel parameters 
“''**} were used at the receiver. Since the channel is time-variant, the 
parameters {a k e cannot be estimated perfectly. In fact, on some channels, 
the time variations may be sufficiently fast to preclude the implementation of 
coherent detection. In such a case, we should consider using either DPSK or 
FSK with noncoherent detection. 

Let us consider DPSK first. In order for DPSK to be a viable digital 
signaling method, the channel variations must be sufficiently slow so that the 
channel phase shifts {<f> k } do not change appreciably over two consecutive 
signaling intervals. In our analysis, we assume that the channel parameters 
{a k e /<A ‘} remain constant over two successive signaling intervals. Thus the 
combiner for binary- DPSK will yield as an output the decision variable 


1/ = Re 



(2£a k e~'** + N k2 )( 2%a k e'*' + ,V**,)J 


(14-4-23) 


where {N k , } and {N k2 } denote the received noise components at the output of 
the matched filters in the two consecutive signaling intervals. The probability 
of error is simply the probability that U < 0. Since U is a special case of the 
general quadratic form in complex-valued gaussian random variables treated in 
Appendix B, the probability of error can be obtained directly from the results 
given in that appendix. Alternatively, we may use the error probability given in 
(12-1-3), which applies to binary DPSK transmitted over L time-invariant 
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channels, and average it over the Rayleigh fading channel statistics. Thus, we 
have the conditional error probability 


P 2 (y») = ( l 2 ) 2L 'e 


yh £ b k yl 


*= o 


(14-4-24) 


where y h is given by (14-4-9) and 


b k 


}_ L y~ k (2L- 1 
k\ h V n 


(14-4-25) 


The average of P 2 (y b ) over the fading channel statistics given by p(y b ) in 
(14-4-13) is easily shown to be 


A = 


1 


2 2L ~ l (L — 1 )!( 1 + y c ) 


I £ b k (L - 1 + *)l(— M* (14-4-26) 

k =0 V 'i + r c / 


We indicate that the result in (14-4-26) can be manipulated into the form given 
in (14-4-15), which applies also to coherent PSK and FSK. For binary DPSK, 
the parameter p. in (14-4-15) is defined as (see Appendix C) 


P = 


y c 

l+y c 


(14-4-27) 


For y c » 1, the error probability in (14-4-26) can be approximated by the 
expression 


Pz 



(14-4-28) 


Orthogonal FSK with noncoherent detection is the final signaling technique 
that we consider in this section. It is appropriate for both slow and fast fading. 
However, the analysis of the performance presented below is based on the 
assumption that the fading is sufficiently slow so that the channel parameters 
{a k e~'* k } remain constant for the duration of the signaling interval. The 
combiner for the multichannel signals is a square-law combiner. Its output 
consists of the two decision variables 


£/. = £ \2%a k e-*' + N kl \ 2 

(14-4-29) 

Uz= £ |N * 2 | 2 

k= I 


where U { is assumed to contain the signal. Consequently the probability of 
error is the probability that U 2 > C/, . 
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As in DPSK, we have a choice of two approaches in deriving the 
performance of FSK with square-law combining. In Section 12-1, we indicated 
that the expression for the error probability for square-law combined FSK is 
the same as that for DPSK with y h replaced by %y k . That is, the FSK system 
requires 3dB of additional SNR to achieve the same performance on a 
time-invariant channel. Consequently, the conditional error probability for 
DPSK given in (14-4-24) applies to square-law-combined FSK when y b is 
replaced by \y b . Furthermore, the result obtained by averaging (14-4-24) over 
the fading, which is given by (14-4-26), must also apply to FSK with y c 
replaced by \y c . But we also stated previously that (14-4-26) and (14-4-15) are 
equivalent. Therefore, the error probability given in (14-4-15) also applies to 
square-law-combined FSK with the parameter p. defined as 


M * 


fc 

2 + y c 


(14-4-30) 


An alternative derivation used by Pierce (1958) to obtain the probability 
that the decision variable U 2 > £/, is just as easy as the method described 
above. It begins with the probability density functions piU^ and p(U 2 ). Since 
the complex-valued random variables {a k e i4,k ), {N kt }, and {N k2 } are zero-mean 
gaussian-distributed, the decision variables t/, and U 2 are distributed according 
to a chi-square probability distribution with 2 L degrees of freedom. That is. 


where 




i 

(2 o*) l (L-1)\ 


U 


L~l 

1 



( 144 - 31 ) 


a^E(|2Sa*e->*‘ + AU 2 ) 


Similarly, 


where 


= 2%N a (l+y c ) 


p(Ui) = 


2 

(2<t\) l (L~\)\ 



(14-4-32) 


< t \ = 2SN 0 


The probability of error is just the probability that U 2 > It is left as an 
exercise for the reader to show that this probability is given by (14-4-15), 
where p is defined by (14-4-30). 

When y c »l, the performance of square-law-detected FSK can be simpl- 
ified as we have done for the other binary multichannel systems. In this case, 
the error rate is well approximated by the expression 



(14-4-33) 


The error rate performance of PSK, DPSK, and square-law-detected 
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FIGURE 14-4-2 Performance of binary signals with diversity. 


orthogonal FSK is illustrated in Fig. 14-4-2 for L- 1, 2, and 4. The 
performance is plotted as a function of the average SNR per bit, y b , which is 
related to the average SNR per channel, y c , by the formula 

% = Ly e (14-4-34) 

The results in Fig. 14-4-2 clearly illustrate the advantage of diversity as a 
means for overcoming the severe penalty in SNR caused by fading. 


14-4*2 Multiphase Signals 

Multiphase signaling over a Rayleigh fading channel is the topic presented in 
some detail in Appendix C. Our main purpose in this section is to cite the 
general result for the probability of a symbol error in M - ary PSK and DPSK 
systems and the probability of a bit error in four-phase PSK and DPSK. 
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The general result for the probability of a symbol error in A/-ary PSK and 
DPSK is 


-n 2 ) L / d L ~' i l 
n(L-\)\ \db L ~ ] \b ~ fi 2 



H sin (n/ M) , -pt cos {rtjM) 11 \ 

Vh - tx 2 cos 2 ( n/M ) COt Vi - /1 2 cos^rr/M)] J /*_, 

where 



for coherent PSK and 


(14-4-35) 


(14-4-36) 


M = 


7c 

l+7c 


(14-4-37) 


for DPSK. Again, y c is the average received SNR per channel. The SNR per 
bit is y b - Lyjk, where k = log 2 M. 

The bit error rate for four-phase PSK and DPSK is derived on the basis that 
the pair of information bits is mapped into the four phases according to a Gray 
code. The expression for the bit error rate derived in Appendix C is 




2 





(14-4-38) 


where fi is again given by (14-4-36) and (14-4-37) for PSK and DPSK, 
respectively. 

Figure 14-4-3 illustrates the probability of a symbol error of DPSK and 
coherent PSK for M =2, 4, and 8 with L = 1. Note that the difference in 
performance between DPSK and coherent PSK is approximately 3 dB for all 
three values of M. In fact, when y b » 1 and L = 1, (14-4-35) is well 
approximated as 


for DPSK and as 


Pm~ 


M-\ 

(Af log 2 A*)|sin 2 (n/M)]y b 


(144-39) 


Pm~- 


M- 1 

(M log 2 M)(sin 2 (x/M)]2y b 


(144-40) 


for PSK. Hence, at high SNR, coherent PSK is -3 dB better than DPSK on a 
Rayleigh fading channel. This difference also holds as L is increased. 

Bit error probabilities are depicted in Fig. 14-4-4 for two-phase, four-phase, 
and eight-phase DPSK signaling with L~ 1, 2, and 4. The expression for the 
bit error probability of eight-phase DPSK with Gray encoding is not given 
here, but it is available in the paper by Proakis (1968). In this case, we observe 
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FIGURE 14-4-3 Probability of symbol error for PSK and DPSK for Rayleigh fading. 


that the performances for two- and four-phase DPSK are (approximately) the 
same, while that for eight-phase DPSK is about 3dB poorer. Although we 
have not shown the bit error probability for coherent PSK, it can be 
demonstrated that two- and four-phase coherent PSK also yield approximately 
the same performance. 

14-4-3 Af-ary Orthogonal Signals 

In this sub-section, we determine the performance of AZ-ary orthogonal signals 
transmitted over a Rayleigh fading channel and we assess the advantages of 
higher-order signal alphabets relative to a binary alphabet. The orthogonal 
signals may be viewed as AZ-ary FSK with a minimum frequency separation of 
an integer multiple of l IT, where T is the signaling interval. The same 
information-bearing signal is transmitted on L diversity channels. Each 
diversity channel is assumed to be frequency -nonselective and slowly fading, 
and the fading processes on the L channels are assumed to be mutually 
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FIGURE 14-4-4 Probability of a bit error for DPSK with diversity for Rayleigh fading. 


statistically independent. An additive white gaussian noise process corrupts the 
signal on each diversity channel. We assume that the additive noise processes 
are mutually statistically independent. 

Although it is relatively easy to formulate the structure and analyze the 
performance of a maximal ratio combiner for the diversity channels in the 
M - ary communication system, it is more likely that a practical system would 
employ noncoherent detection. Consequently, we confine our attention to 
square-law combining of the diversity signals. The output of the combiner 
containing the signal is 

L 

£/, = £ l2Wa k e-'+* + N kl | 2 (14-4-41) 

while the outputs of the remaining M — 1 combiners are 

L 

v m = X l*U 2 . ™ - 2, 3, 4 M (14-4-42) 
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The probability of error is simply 1 minus the probability that U x > U m for 
m = 2, 3, . . . , M. Since the signals are orthogonal and the additive noise 
processes are mutually statistically independent, the random variables 
U u U 2 , ■ ■ . , U M are also mutually statistically independent. The probability 
density function of U x was given in (14-4-31). On the other hand, U 2 , . . . , U M 
are identically distributed and described by the marginal probability density 
function in (14-4-32). With U x fixed, the joint probability P(U 2 < U U U 3 < 
U t U m < U x ) is equal to P{U 2 <U X ) raised to the M - 1 power. Now, 


P(U 2 <U } 


fO, 

>> = J 

Jo 


p(Ui)dU 2 


= 1 




(14-4-43) 


where a\ = 2%N 0 . The M — 1 power of this probability is then averaged over 
the probability density function of U x to yield the probability of a correct 
decision. If we subtract this result from unity, we obtain the probability of 
error in the form given by Hahn (1962) 


=i_ i* (i+f.)'(L-i)!^"' exp (~i+y 

/ L-l 


(14-4-44) 


where y c is the average SNR per diversity channel. The average SNR per bit is 
y b = Ly./log, M - Lyjk. 

The integral in (14-4-44) can be expressed in closed form as a double 
summation. This can be seen if we write 


L— 1 U*\n i m{L— l) 

2 77 ) = 2 fikmUl (14-4-45) 

*»0 Kl / *=0 


where (3 km is the set of coefficients in the above expansion. Then it follows that 
(14-4-44) reduces to 


P* = 


1 


(" 7 1 ) 

v \ m f 


(L - 1)! m?\ (1 + m+my c ) L 

m(L- I) 

X 


*-o \1 + m + my c f 


(14-4-46) 
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FIGURE 14-4-5 


When there is no diversity { L 
the simple form 

Pm 

The symbol error rate P M may be converted to an equivalent bit error rate by 
multiplying P M with 2* _, /(2* - 1). 

Although the expression for P M given in (14-4-46) is in closed form, it is 
computationally cumbersome to evaluate for large values of M and L. An 
alternative is to evaluate P M by numerical integration, using the expression in 
(14-4-44) yhe results illustrated in the following graphs were generated from 
(14-4-44). 

First of all, let us observe the error rate performance of Af-ary orthogonal 
signaling with square-law combining as a function of the order of diversity. 
Figures 14-4-5 and 14-4-6 illustrate the characteristics of P M for M = 2 and 4 as 


= 1), the error probability in (14-4-46) reduces to 
' M - V 

) 

(14-4-47) 


r '-rC) 

1 + m + my c 


Performance of square -law-detected 
binary orthogonal signals as a function 
of diversity. 
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FIGURE 14-4-6 


Performance of square-law-detected 
W = 4 orthogonal signals as a function 
of diversity. 



a function of L when the total SNR, defined as y, = Ly c , remains fixed. These 
results indicate that there is an optimum order of diversity for each y,. That is. 
for any y„ there is a value of L for which P M is a minimum. A careful 
observation of these graphs reveals that the minimum in P M is obtained when 
y c - y,/L « 3. This result appears to be independent of the alphabet size M. 

Second, let us observe the error rate P M as a function of the average SNR 
per bit, defined as y b = Lyjk. (If we interpret A/-ary orthogonal FSK as a 
form of codingt and the order of diversity as the number of times a symbol is 
repeated in a repetition code then y b = yJR c , where R c — k/L is the code 
rate.) The graphs of P M versus y b for M = 2, 4, 8, 16, 32 and L = 1, 2, 4 are 
shown in Fig. 14-4-7. These results illustrate the gain in performance as M 
increases and L increases. First, we note that a significant gain in performance 
is obtained by increasing L. Second, we note that the gain in performance 
obtained with an increase in M is relatively small when L is small. However, 

t In Section 14-6, we show that W ary orthogonal FSK with diversity may he viewed as a block 
orthogonal code. 
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FIGURE 14-4-7 
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Performance of orthogonal signaling with M and L as parameters. 


as L increases, the gain achieved by increasing M also increases. Since an 
increase in either parameter results in an expansion of bandwidth, i.e., 

B LM 
' ~ logz M 

the results illustrated in Fig. 14-4-7 indicate that an increase in L is more 
efficient than a corresponding increase in M. As we shall see in Section 14-6, 
coding is a bandwidth-effective means for obtaining diversity in the signal 
transmitted over the fading channel. 

Cheraoff Bound Before concluding this section, we develop a Chemoff 
upper bound on the error probability of binary orthogonal signaling with 
Lt h-order diversity, which will be useful in our discussion of coding for fading 
channels, the topic of Section 14-6. Our starting point is the expression for the 
two decision variables £/, and U 2 given by (14-4-29), where U, consists of the 
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square-law-combined signal-plus-noise terms and U 2 consists of square-law- 
combined noise terms. The binary probability of error, denoted here by P 2 (L), 
as 

P 2 (L) = P{U 2 - U t >0) 

= P(X > 0) = f p{x) dx (14-4-48) 

Jo 

where the random variable X is defined as 

X = U 2 - U, = t (W«l 2 - I2«a* + N*,| 2 ) (14-4-49) 

* = i 

The phase terms {<£*} in have been dropped since they do not affect the 
performance of the square-law detector. 

Let S(X ) denote the unit step function. Then the error probability in 
(14-4-48) can be expressed in the form 


P 2 (L) = E[S(X)] (14-4-50) 

Following the development in Section 2-1-5, the Chernoff bound is obtained by 
overbounding the unit step function by an exponential function. That is, 

S(*) e'*, £2=0 (14-4-51) 

where the parameter £ is optimized to yield a tight bound. Thus, we have 

P 2 (L) = £[S(X)J « E(e (X ) (14-4-52) 

Upon substituting for the random variable X from (14-4-49) and noting that 
the random variables in the summation are mutually statistically independent, 
we obtain the result 


But 


and 


P 2 {L) ^ ]1 E(e ( ' N ^)E{e- {a *°' +s “' 2 ) 


(14-4-53) 




(14-4-54) 


E(e 


-fl2*a 4 + /V, l |^ _ 


1 -1 
1 + 2 fcrT (> 2ai 


(14-4-55) 


where a\ = 287V n , cr, = 2$/V 0 (l + y c ), and y c is the average SNR per diversity 
channel. Note that a] and a\ are independent of k, i.e., the additive noise 
terms on the L diversity channels as well as the fading statistics are identically 
distributed. Consequently, (14-4-53) reduces to 


Pi{L) 


.( 1 - 


1 


(l'2^)(l+2frr 


- L 

T)\ ’ 


i 




(14-4-56) 
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FIGURE 14-4-8 


By differentiating the right-hand side of (14-4-56) with respect to we find 
that the upper bound is minimized when 


£ = 


o\ 


<rl 


4<ti<t 2 


(14-4-57) 


Substitution of (14-4-57) for f into (14-4-56) yields the Chernoff upper bound 
in the form 


P 2 {L)* 


4(1 + y c ) 


^ ni- 


L(2+y.-) 2 J 


(14-4-58) 


It is interesting to note that (14-4-58) may also be expressed as 

P 2 (L)^[4p(]-p)\ L (14-4-59) 

where p ~ 1/(2 + y ( .) is the probability of error for binary orthogonal signaling 
on a fading channel without diversity. 

A comparison of the Chernoff bound in (14-4-58) with the exact error 
probability for binary orthogonal signaling and square-law combining of the L 
diversity signals, which is given by the expression 


P 2 (L) 


-M ^ ( L 

i + yJ 


-p l 2 

k = 0 v 


L - l + k 
k 


~ l +*\/ l + yj \ k 
k )\2 + yJ 

)d ~Pf 


(14-4-60) 


reveals the tightness of the bound. Figure (14-4-8) illustrates this comparison. 


Comparison of Chernoff bound with exact 
error probability. 
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We observe that the Chernoff upper bound is approximately 6dB from the 
exact error probability for L = 1, but, as L increases, it becomes tighter. For 
example, the difference between the bound and the exact error probability is 
about 2.5 dB when L = 4. 

Finally we mention that the error probability for M - ary orthogonal signaling 
with diversity can be upper-bounded by means of the union bound 

P m ^(M-1)P 2 (L) (14-4-61) 

where we may use either the exact expression given in (14-4-60) or the 
Chernoff bound in (14-4-58) for P 2 (L). 


14-5 DIGITAL SIGNALING OVER A FREQUENCY- 
SELECTIVE, SLOWLY FADING CHANNEL 

When the spread factor of the channel satisfies the condition T m B d « 1, it is 
possible to select signals having a bandwidth W « (A/), and a signal duration 
T « (A i) c . Thus, the channel is frequency-nonselective and slowly fading. In 
such a channel, diversity techniques can be employed to overcome the severe 
consequences of fading. 

When a bandwidth W »(A f\ is available to the user, the channel can be 
subdivided into a number of frequency-division multiplexed (FDM) subchan- 
nels having a mutual separation in center frequencies of at least (A f) c . Then 
the same signal can be transmitted on the FDM subchannels, and, thus, 
frequency diversity is obtained. In this section, we describe an alternative 
method. 


14-5-1 A Tapped-Delay-Line Channel Model 

As we shall now demonstrate, a more direct method for achieving basically the 
same result is to employ a wideband signal covering the bandwidth W. The 
channel is still assumed to be slowly fading by virtue of the assumption that 
T«(At) c . Now suppose that W is the bandwidth occupied by the real 
bandpass signal. Then the band occupancy of the equivalent lowpass signal 
J /( 0 * s I/I ^ iW. Since s,(t) is band-limited to |/[ ^ W, application of the 

sampling theorem results in the signal representation 




sin [nW{t - n/W)] 
7TW(t~n/W ) 


The Fourier transform of s,(r) is 

, 7j) £ s,(n/W)e^' iW (|/|^W) 

" /»“ -OQ 

-0 (If > W) 


(14-5-1) 


(14-5-2) 


S,(f) = 
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The noiseless received signal from a frequency-selective channel was 
previously expressed in the form 

n(t) = f CifuMfV^'df (14-5-3) 

where C(/; f) is the time-variant transfer function. Substitution for S,(f) from 
(14-5-2) into (14-5-3) yields 

r,(t) = ^ £ s,(n/W)f C(f,t)e i2Kfi '- n,W) df 

= 7n £ Si(n/W)c(t-n/W;t) (14-5-4) 

** n = — * 

where c(r; r) is the time-variant impulse response. We observe that (14-5-4) 
has the form of a convolution sum. Hence, it can also be expressed in the 
alternative form 

r(t) = -J- 2 *,{t-nlV/)c(nlW-t) (14-5-5) 

It is convenient to define a set of time-variable channel coefficients as 

c " (0 = w c (w ; ') (14 ' 5 ' 6) 

Then (14-5-5) expressed in terms of these channel coefficients becomes 

oc 

'•(')= £ c H (t)s t {t-nlW) (14-5-7) 

n = — 

The form for the received signal in (14-5-7) implies that the time-variant 
frequency-selective channel can be modeled or represented as a tapped delay 
line with tap spacing l/W and tap weight coefficients (c„(t)}. In fact, we deduce 
from (14-5-7) that the lowpass impulse response for the channel is 


oc 

c(r;f) = £ c„(t)8(r - n/W) (14-5-8) 

n = -so 

and the corresponding time-variant transfer function is 


C(/;r) = £ c„(Oe 


(14-5-9) 
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FIGURE 14*5*1 Trapped delay line model of frequency -selective channel 


Thus, with an equivalent lowpsss signal having a bandwidth where 
W » (\f) c , we achieve a resolution of l/W in the multipath delay profile. 
Since the total multipath spread is T m , for all practical purposes the tapped 
delay line model for the channel can be truncated at L = [T m W] + 1 taps. Then 
the noiseless received signal can be expressed in the form 

n(0 = 2 (14-5-10) 

The truncated tapped delay line model is shown in Fig. 14-5-1. In 
accordance with the statistical characterization of the channel presented in 
Section 14-1, the time-variant tap weights {c„(f)} are complex-valued stationary 
random processes. In the special case of Rayleigh fading, the magnitudes 
MOI * «*(0 are Rayleigh-distributed and the phases 4>„{t) are uniformly 
distributed. Since the {c„(f)} represent the tap weights corresponding to the L 
different delays t = n(W, the uncorrelated scattering 

assumption made in Section 7-1 implies that the {c„(f)} are mutually 
uncorrelated. When the {c„(f)} are gaussian random processes, they are 
statistically independent. 

14-5*2 The RAKE Demodulator 

We now consider the problem of digital signaling over a frequency-selective 
channel that is modeled by a tapped delay line with statistically independent 
time-variant tap weights {c„(/)}. It is apparent at the outset, however, that the 
tapped delay line model with statistically independent tap weights provides us 
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with L. replicas of the same transmitted signal at the receiver. Hence, a receiver 
that processes the received signal in an optimum manner will achieve the 
performpnce of an equivalent Lth-order diversity communications system. 

Let us consider binary signaling over the channel, We have two equal- 
energy signals s„(r) and s n (t), which are either antipodal or orthogonal. Their 
time duration T is selected to satisfy the condition T » T m . Thus, we may 
neglect any intersymbol interference due to multipath. Since the bandwidth of 
the signal exceeds the coherent bandwidth of the channel, the received signal is 
expressed as 

L 

n ( 0 = 2 c k (t)s u (t - k/W) + z(t) 

* = 1 

= Wi(0 + z(0» 1 = 1,2 (14-5-11) 

where z(t) is a complex-valued zero-mean white gaussian noise process. 
Assume for the moment that the channel lap weights are known. Then the 
optimum receiver consists of two filters matched to Vj(f) and v 2 (t), followed by 
samplers and a decision circuit that selects the signal corresponding to the 
largest output. An equivalent optimum receiver employs cross correlation 
instead of matched filtering. In either case, the decision variables for coherent 
detection of the binary signals can be expressed as 

U m = Re ^ J /}(/)«*(/)*] 

= Re fS f r,{t)ct(t)sUt-klW)dt\ m = l,2 (14-5-12) 

l*-i Jo J 

Figure 14-5-2 illustrates the operations involved in the computation of the 
decision variables. In this realization of the optimum receiver, the two 
reference signals are delayed and correlated with the received signal r,(i). 

An alternative realization of the optimum receiver employs a single delay 
line through which is passed the received signal r,{t). The signal at each tap is 
correlated with c k (t)s£,(t), where k = 1, 2, . . . , L and m = 1, 2. This receiver 
structure is shown in Fig. 14-5-3. In effect, the tapped delay line receiver 
attempts to collect the signal energy from all the received signal paths that fall 
within the span of the delay line and carry the same information. Its action is 
somewhat analogous to an ordinary garden rake and, consequently, the name 
“RAKE receiver” has been coined for this receiver structure by Price and 
Green (1958). 


14-5-3 Performance of RAKE Receiver 

We shall now evaluate the performance of the RAKE receiver under the 
condition that the fading is sufficiently slow to allow us to estimate c*(f) 
perfectly (without noise). Furthermore, within any one signaling interval, c k (t) 
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FIGURE 14-5-2 Optimum demodulator for wideband binary signals (delayed reference configuration). 


is treated as a constant and denoted as c k . Thus the decision variables in 
(14-5-12) may be expressed in the form 

= r(t)s? m (t-k/W)dt], m = 1,2 (14-5-13) 

Suppose the transmitted signal is s/,(r); then the received signal is 

L 

»)(0- 2 c n s, x (i- niw) + z(t), 0«/«7 (14-5-14) 

M = 1 

Substitution of (14-5-14) into (14-5-13) yields 

f L L rT 

f/ m = Re 2 c t 2 C, I s n (t - n/W)s* m (t - k/W) dt 

L*„] „_j J 0 

+ Re[2c?f z(t)sUt-klW)dt\ m = 1,2 (14-5-15) 

L * = l h J 
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FIGURE 14-5-3 



Optimum demodulator for wideband binary signals (delayed received signal configuration). 


Usually the wideband signals s n (t) and s a (t) are generated from pseudo- 
random sequences, which result in signals that have the property 

f s„('-n/W)sJ(r-Jt/W)df~0, k*n, i= 1,2 (14-5-16) 

Jo 

If we assume that our binary signals are designed to satisfy this property then 
(14-5-15) simplifies tot 

U*. = Re [ S |c*| 2 jf s n (t - k/W)sUt - klW) dr] 

+ Ref 2 cjf «(*>&(/ -*/W) dt], m = 1,2 (14-5-17) 

Jo J 


t Although the orthogonality property specified by (14-5-16) can be satisfied by proper 
selection of the pseudo-random sequences, the cross-correlation of j„(r -n)W) with rjf(/ - k/W) 
gives rise to a signal-dependent self-noise, which ultimately limits the performance. For simplicity, 
we do not consider the self-noise term in the following calculations. Consequently, the 
performance results presented below should be considered as lower bounds (ideal RAKE). An 
approximation to the performance of the RAKE can be obtained by treating the self-noise as an 
additional gaussian noise component with noise power equal to its variance. 
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When the binary signals are antipodal, a single decision variable suffices. In 
this case, (14-5-17) reduces to 


where a k = |c*| and 


17, = Re 




N k =e>+>\ T 

A> 


z(t)sT(t-k/W)dt 


(14-5-18) 

(14-5-19) 


But (14-5-18) is identical to the decision variable given in (14-4-4), which 
corresponds to the output of a maximal ratio combiner in a system with 
ith-order diversity. Consequently, the RAKE receiver with perfect (noiseless) 
estimates of the channel tap weights is equivalent to a maximal ratio combiner 
in a system with Lth-order diversity. Thus, when all the tap weights have the 
same mean-square value, i.e., E(a 2 k ) is the same for all k, the error rate 
performance of the RAKE receiver is given by (14-4-15) and (14-4-16). On the 
other hand, when the mean square values E(ct\) are not identical for all k, the 
derivation of the error rate' performance must be repeated since (14-4-15) no 
longer applies. 

We shall derive the probability of error for binary antipodal and orthogonal 
signals under the condition that the mean-square values of {a*} are distinct. 
We begin with the conditional error probability 

p 2(y b ) = Q(^y b (l -p r )) (14-5-20) 

where p, = - 1 for antipodal signals, p, = 0 for orthogonal signals, and 


* V 2 
y b = 77 2 «* 

Wo * = i 


= 2 r* 


( 14-5-21') 


Each of the is distributed according to a chi-squared distribution with 
two degrees of freedom. That is, 


P(yk) = zre (14-5-22) 

y* v ' 

where y k is the average SNR for the *th path, defined as 

& 

y* = — E(a\) (14-5-23) 

Furthermore, from (14-4-10) we know that the characteristic function of y k is 

•M/v) = 7 — — (14-5-24) 

1 -jvy k ’ 
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Since y h is the sum of L statistically independent components {y*}, the 
characteristic function of y b is 

w=rir W (l4 ' 5 ' 25) 

*=i 1 ) v yk 

The inverse Fourier transform of the characteristic function in (14-5-25) yields 
the probability density function of y b in the form 

p(y„) = S y h ^0 (14-5-26) 

*= 1 7k 

where n k is defined as 

(14-5-27) 

iri 7k~7i 

i*k 


When the conditional error probability in (14-5-20) is averaged over the 
probability density function given in (14-5-26), the result is 


*=i •- \2+y*(l-p,)J 

This error probability can be approximated as (y* » 1) 

/2L-1\ 1 

2 \ L ) *v, 2y*(l — p,) 


(14-5-28) 


(14-5-29) 


By comparing (14-5-29) for p r = —1 with (14-4-18), we observe that the same 
type of asymptotic behavior is obtained for the case of unequal SNR per path' 
and the case of equal SNR per path. 

In the derivation of the error rate performance of the RAKE receiver, we 
assumed that the estimates of the channel tap weights are perfect. In practice, 
relatively good estimates can be obtained if the channel fading is sufficiently 
slow, e.g., (ht)JT 5 s 100, where T is the signaling interval. Figure 14-5-4 
illustrates a method for estimating the tap weights when the binary signaling 
waveforms are orthogonal. The estimate is the output of the lowpass filter at 
each tap. At any one instant in time, the incoming signal is either s n (t) or s, 2 (t). 
Hence, the input to the lowpass filter used to estimate c*(?) contains signal plus 
noise from one of the correlators and noise only from the other correlator. 
This method for channel estimation is not appropriate for antipodal signals, 
because the addition of the two correlator outputs results in signal cancellation. 
Instead, a single correlator can be employed for antipodal signals. Its output is 
fed to the input of the lowpass filter after the information-bearing signal is 
removed. To accomplish this, we must introduce a delay of one signaling 
interval into the channel estimation procedure, as illustrated in Fig. 14-5-5. 
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orthogonal signals. 
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FIGURE 14*5-6 



Decision varieWe 

RAKE demodulator for DPSK signals. 


That is, first the receiver must decide whether the information in the received 
signal is +1 or -1 and, then, it uses the decision to remove the information 
from the correlator output prior to feeding it to the lowpass fiter. 

If we choose not to estimate the tap weights of the frequency-selective 
channel, we may use either DPSK signaling or noncoherently detected 
orthogonal signaling. The RAKE receiver structure for DPSK is illustrated in 
Fig 14-5-6. It is apparent that when the transmitted signal waveform s,(t) 
satisfies the orthogonality property given in (14-5-16), the decision variable is 
identical that given in (14-4-23) for an Lth-order diversity system. Conse- 
quently, the error rate performance of the RAKE receiver for a binary DPSK 
is identical to that given in (14-4-15) with fi = y c /(l + -y c ), when all the signal 
paths have the same SNR y c . On the other hand, when the SNRs {>*} are 
distinct, the error probability can be obtained by averaging (14-4-24), which is 
the probability of error conditioned on a time-invariant channel, oveT the 
probability density function of y b given by (14-5-26). The result of this 
integration is 

Pi = (\) 2L ~ l 2 ntlb m 2 (14-5-30) 

m-0 *-l 7 k '1 + Yk' 

where x k is defined in (14-5-27) and b m in (14-4-25). 

Finally, we consider binary orthogonal signaling over the frequency- 
selective channel with square-law detection at the receiver. This type of signal 
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FIGURE 14-5-7 RAKE demodulator for square-law combination of orthogonal signals. 


is appropriate when either the fading is rapid enough to preclude a good 
estimate of the channel tap weights or the cost of implementing the channel 
estimators is high. The RAKE receiver with square-law combining of the signal 
from each tap is illustrated in Fig. 14-5-7. In computing its performance, we 
again assume that the orthogonality property given in (14-5-16) holds. Then 
the decision variables at the output of the RAKE are 



where we have assumed that s n (t) was the transmitted signal. Again we 
observe that the decision variables are identical to the ones given in (14-4-29), 
which apply to orthogonal signals with Lth-order diversity. Therefore, the 
performance of the RAKE receiver for square-law-detected orthogonal signals 
is given by (14-4-15) with n = yj(2 + y c ) when all the signal paths have the 
same SNR. If the SNRs are distinct, we can average the conditional error 
probability given by (14-4-24), with y b replaced by \y b , over the probability 
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density function p( y h ) given in (14-5-26). The result of this averaging is given 
by (14-5-30), with y k replaced by (-y*. 

In the above analysis, the RAKE demodulator shown in Fig. 14-5-7 for 
square-law combination of orthogonal signals is assumed to contain a signal 
component at each delay. If that is not the case, its performance will be 
degraded, since some of the tap correlators will contribute only noise. Under 
such conditions, the low-level, noise-only contributions from the tap cor- 
relators should be excluded from the combiner, as shown by Chyi et at. (1988). 

This concludes our discussion of signaling over a frequency-selective 
channel. The configurations of the RAKE receiver presented in this section 
can be easily generalized to multilevel signaling. In fact, if M-ary PSK or 
DPSK is chosen, the RAKE structures presented in this section remain 
unchanged. Only the PSK and DPSK detectors that follow the RAKE 
correlator are different. 

14-6 CODED WAVEFORMS FOR FADING CHANNELS 

Up to this point, we have demonstrated that diversity techniques are very 
effective in overcoming the detrimental effects of fading caused by the 
time-variant dispersive characteristics of the channel. Time- and/or frequency- 
diversity techniques may be viewed as a form of repetition (block) coding of 
the information sequence. From this point of view, the combining techniques 
described previously represent soft-decision decoding of the repetition code. 
Since a repetition code is a trivial form of coding, we shall now consider the 
additional benefits derived from more efficient types of codes. In particular, we 
demonstrate that coding provides an efficient means for obtaining diversity on 
a fading channel. The amount of diversity provided by a code is directly related 
to its minimum distance. 

As explained in Section 14-4, time diversity is obtained by transmitting the 
signal components carrying the same information in multiple time intervals 
mutually separated by an amount equal to or exceeding the coherence time 
(At), of the channel. Similarly, frequency diversity is obtained by transmitting 
the signal components carrying the same information in multiple frequency 
slots mutually separated by an amount of at least equal to the coherence 
bandwidth (A/), of the channel. Thus, the signal components carrying the same 
information undergo statistically independent fading. 

To extend these notions to a coded information sequence, we simply require 
that the signal waveform corresponding to a particular code or code symbol 
fade independently of the signal waveform corresponding to any other code bit 
or code symbol. This requirement may result in inefficient utilization of the 
available time-frequency space, with the existence of large unused portions in 
this two-dimensional signaling space. To reduce the inefficiency, a number of 
code words may be interleaved in time or in frequency or both, in such a 
manner that the waveform corresponding to the bits or symbols of a given code- 
word fade independently. Thus, we assume that the time-frequency signaling 
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space is partitioned into nonoverlapping time-frequency cells. A signal 
waveform corresponding to a code bit or code symbol is transmitted within 
such a cell. 

In addition to the assumption of statistically independent fading of the 
signal components of a given code word, we also assume that the additive noise 
components corrupting the received signals are white gaussian processes that 
are statistically independent and identically distributed among the cells in the 
time-frequency space. Also, we assume that there is sufficient separation 
between adjacent cells so that intercell interference is negligible. 

An important issue is the modulation technique that is used to transmit the 
coded information sequence. If the channel fades slowly enough to allow the 
establishment of a phase reference then PSK or DPSK may be employed. If 
this is not possible then FSK modulation with noncoherent detection at the 
receiver is appropriate. In our treatment, we assume that it is not possible to 
establish a phase reference or phase references for the signals in the different 
ceils occupied by the transmitted signal. Consequently, we choose FSK 
modulation with noncoherent detection. 

A model of the digital communications system for which the error rate 
performance will be evaluated is shown in Fig. 14-6-1. The encoder may be 
binary, nonbinary, or a concatenation of a nonbinary encoder with a binary 
encoder. Furthermore, the code generated by the encoder may be a block 
code, a convolutional code, or, in the case of concatenation, a mixture of a 
block code and a convolutional code. 

In order to explain the modulation, demodulation, and decoding for 
FSK-type (orthogonal) signals, consider a linear binary block code in which k 
information bits are encoded into a block of n bits. For simplicity and without 
loss of generality, let us assume that all n bits of a code word are transmitted 
simultaneously over the channel on multiple frequency cells. A code word C, 
having bits {c /y } is mapped into FSK signal waveforms in the following way. If 
Cij — 0, the tone f Uj is transmitted, and if c,, = 1, the tone /,, is transmitted. This 
means that 2 n tones or cells are available to transmit the n bits of the code 
word, but only n tones are transmitted in any signaling interval. Since each 
code word conveys k bits of information, the bandwidth expansion factor for 
FSK is B e =2n/k. 

The demodulator for the received signal separates the signal into In 


FIGURE 14-6-1 
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spectral components corresponding to the available lone frequencies at the 
transmitter. Thus, the demodulator can be realized as a bank of 2 n filters, 
where each filter is matched to one of the possible transmitted tones. The 
outputs of the 2 n filters are detected noncoherently. Since the Rayleigh fading 
and the additive white gaussian noises in the 2n frequency cells are mutually 
statistically independent and identically distributed random processes, the 
optimum maximum-likelihood soft-decision decoding criterion requires that 
these filter responses be square-law-detected and appropriately combined for 
each code word to form the M ~ 2 k decision variables. The code word 
corresponding to the maximum of the decision variables is selected. If 
hard-decision decoding is employed, the optimum maximum-likelihood de- 
coder selects the code word having the smallest Hamming distance relative to 
the received code word. 

Although the discussion above assumed the use of a block code, a 
convolutional encoder can be easily accommodated in the block diagram 
shown in Fig. 14-6-1. For example, if a binary convolutional code is employed, 
each bit in the output sequence may be transmitted by binary FSK. The 
maximum-likelihood soft-decision decoding criterion for the convolutional 
code can be efficiently implemented by means of the Viterbi algorithm, in 
which the metrics for the surviving sequences at any point in the trellis consist 
of the square-law-combined outputs for the corresponding paths through the 
trellis. On the other hand, if hard-decision decoding is employed, the Viterbi 
algorithm is implemented with Hamming distance as the metric. 


14-6*1 Probability of Error for Soft-Decision Decoding of 
Linear Binary Block Codes 

Consider the decoding of a linear binary (n, k ) code transmitted over a 
Rayleigh fading channel, as described above. The optimum soft-decision 
decoder, based on the maximum-likelihood criterion, forms the M = 2* 
decision variables 


*4- - 2 IU ~ c„) \ y(tl \ 2 + c 0 Ijvj/j 
/* » 
n 

= X [I Vo/I 2 + Cydy,,! 2 - ly,/)], 1 = 1,2, .... 2* (14-6-1) 

>= i 

where \y rJ \ 2 , j — 1, 2 n, and r — 0, 1 represent the squared envelopes at the 

outputs of the In filters that are tuned to the 2/t possible transmitted tones. A 
decision is made in favor of the code word corresponding to the largest 
decision variable of the set {Vi}. 

Our objective in this section is the determination of the error rate 
performance of the soft-decision decoder. Toward this end, let us assume that 
the all-zero code word Cj is transmitted. The average received signal-to-noise 
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ratio per tone (cell) is denoted by y t . The total received SNR for the n tones in 
ny t and. hence, the average SNR per bit is 


Yb ~ 7 Y. 

k 

= Jr 

R. 


(14-6-2) 


where /?, is the code rate. 

The decision variable U t corresponding to the code word C, is given by 
(14-6-1) with Cy = 0 for all /. The probability that a decision is made in favor of 
the with code word is just 


P 2 (m) = P(U m >U l ) = P(U,- U m < 0) 

= p [ 2 ( c i> ~ c«/)(lyi/l 2 - iyo,l 2 ) < o] 

^[|(lV-|y,l 2 )<o] 


(14-6-3) 


where w m is the weight of the mth code word. But the probability in (14-6-3) is 
just the probability of error for square-law combining of binary orthogonal 
FSK with w„,th-order diversity. That is. 


P 2 (m)=p“ 


where 


2(*; ,+ Vp>- 

A =0 ' K f 

(14-6-4) 

“v ( W m - l + *\ /2w m ~1\ 

k M w,„ >" 

(14-6-5) 

1 1 


P 2+ y t . 2 + R t y h 

(14-6-6) 


As an alternative, we may use the Chemoff upper bound derived in Section 
14-4, which in the present notation is 


P 2 (m)^[4p(l-p)Y 


(14-6-7) 


The sum of the binary error events over the M - 1 nonzero- weight code 
words gives an upper bound on the probability of error. Thus, 


S4 


Pu^ll Pi(m) 


m = 2 


(14-6-8) 


Since the minimum distance of the linear code is equal to the minimum 
weight, it follows that 


(1+Jt c ?„r w ~*(2+Jl c y t y 
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FIGURE 14-6-2 


The use of this relation in conjunction with (14-6-5) and (14-6-8) yields a 
simple, albeit looser, upper bound that may be expressed in the form 


Pm < 


£ ( 2K '"- 1 ) 

"> = 2 K ! 

(2 + R c y h y‘™ 


(14-6-9) 


This simple bound indicates that the code provides an effective order of 
diversity equal to d mm . An even simpler bound is the union bound 

< (Af - l)[4p(l - p)Y (14-6-10) 

which is obtained from the Chernoff bound given in (14-6-7). 

As an example serving to illustrate the benefits of coding for a Rayleigh 
fading channel, we have plotted in Fig. 14-6-2 the performance obtained with 
the extended Golay (24, 12) code and the performance of binary FSK and 
quarternay FSK each with dual diversity. Since the extended Golay code 
requires a total of 48 cells and k = 12, the bandwidth expansion factor B e = 4. 
This is also the bandwidth expansion factor for binary and quaternary FSK 
with L — 2. Thus, the three types of waveforms are compared on the basis of 
the same, bandwidth expansion factor. Note that at P h = 1(T\ the Golay code 
outperforms quaternary FSK by more than 6dB, and at P h = 10“ 5 , the 
difference is approximately 10 dB. 


Example of performance obtained 
with conventional diversity versus coding 
for B r = 4. 



12 14 16 18 20 22 24 26 

SNR perbii, Y t (dB) 
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The reason for the superior performance of the Golay code is its large 
minimum distance (d mw = 8), which translates into an equivalent eighth-order 
(L = 8) diversity. In contrast, the binary and quaternary FSK signals have only 
second-order diversity. Hence, the code makes more efficient use of the 
available channel bandwidth. The price that we must pay for the superior 
performance of the code is the increase in decoding complexity. 


14-6*2 Probability of Error for Hard-Decision Decoding of 
Linear Binary Block Codes 

Bounds on the performance obtained with hard-decision decoding qf a linear 
binary { n , k ) code have already been given in Section 8-1-5. These bounds are 
applicable to a general binary-input binary-output memoryless (binary sym- 
metric) channel and, hence, they apply without modification to a Rayleigh 
fading AWGN channel with statistically independent fading of the symbols in 
the code word. The probability of a bit error needed to evaluate these bounds 
when binary FSK with noncoherent detection is used as the modulation and 
demodulation technique is given by (14-6-6). 

A particularly interesting result is obtained when we use the Chernoff upper 
bound on the error probability for hard-decision decoding given by (8-1-89). 
That is, 

P 2 {m)^p{\-p)r< a (14-6-11) 

and P M is upper-bounded by (14-6-8). In comparison, the Chernoff upper 
bound for P^m) when soft-decision decoding is employed is given by (14-6-7). 
We observe that the effect of hard-decision decoding is a reduction in the 
distance between any two code words by a factor of 2. When the minimum 
distance of a code is relatively small, the reduction of the distances by a factor 
of 2 is much more noticeable in a fading channel than in a nonfading channel. 

For illustrative pruposes we have plotted in Fig. 14-6-3 the performance of 
the Golay (23, 12) code when hard-decision and soft-decision decoding are 
used. The difference in performance at P b = 10 5 is approximately 6dB. This is 
a significant difference in performance compared with the 2dB difference 
between soft- and hard-decision decoding in a nonfading AWGN channel. We 
also note that the difference in performance increases as P b decreases. In short, 
these results indicate the benefits of a soft-decision decoding over hard- 
decision decoding on a Rayleigh fading channel. 


14-6-3 Upper Bounds on the Performance of Convolutional 
Codes for a Rayleigh Fading Channel 

In this subsection, we derive the performance of binary convolutional codes 
when used on a Rayleigh fading AW'GN channel. The encoder accepts k 
binary digits at a time and puts out n binary digits at a time. Thus, the code 
rate is R c = k/n. The binary digits at the output of the encoder are transmitted 
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FIGURE 14-6-3 


Comparison of performance between hard- 
and soft-decision decoding. 



12 14 16 18 20 22 24 26 

SNR per bit.^ldB) 


over the Rayleigh fading channel by means of binary FSK, which is 
square-law-detected at the receiver. The decoder for either soft- or hard- 
decision decoding performs maximum-likelihood sequence estimation, which is 
efficiently implemented by means of the Viterbi algorithm. 

First, we consider soft-decision decoding. In this case, the metrics computed 
in the Viterbi algorithm are simply sums of square-law-detected outputs from 
the demodulator. Suppose the all-zero sequence is transmitted. Following the 
procedure outlined in Section 8-2-3, it is easily shown that the probability of 
error in a pairwise comparison of the metric corresponding to the all-zero 
sequence with the metric corresponding to another sequence that merges for 
the first time at the all-zero state is 

W) = P d 2 (* “ l + *)(1 - p) k (14-6-12) 

where d is the number of bit positions in which the two sequences differ and p 
is given by (14-6-6). That is, / 2 (d) is just the probability of error for binary 
FSK with square-law detection and dth-order diversity. Alternatively, we may 
use the Chernoff bound in (14-6-7) for P 2 (d). In any case, the bit error 
probability is upperbounded, as shown in Section 8-2-3 by the expression 

K 


(14-6-13) 
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where the weighting coefficients {p d } in the summation are obtained from the 
expansion of the first derivative of the transfer function T(D, N), given by 
(8-2-25). 

When hard-decision decoding is performed at the receiver, the bounds on 
the error rate performance for binary convolutional codes derived in Section 
8-2-4 apply. That is, P b is again upper-bounded by the expression in (14-6-13), 
where P 2 (d) is defined by (8-2-28) for odd d and by (8-2-29) for even d, or 
upper-bounded (Chemoff bound) by (8-2-31), and p is defined by (14-6-6). 

As in the case of block coding, when the respective Chemoff bounds are 
used for P 2 (d ) with hard-decision and soft-decision decoding, it is interesting to 
note that the effect of hard-decision decoding is to reduce the distances 
(diversity) by a factor of 2 relative to soft-decision- decoding. 

The following numerical results illustrate the error rate performance of 
binary, rate 1 In, maximal free distance convolutional codes for n = 2, 3, and 4 
with soft-decision Viterbi decoding. First of all. Fig. 14-6-4 shows the 
performance of the rate 1/2 convolutional codes for constraint lengths 3, 4, and 
5. The bandwidth expansion factor for binary FSK modulation is B e = 2 n. 
Since an increase in the constraint length results in an increase in the 
complexity of the decoder to go along with the corresponding increase in the 
minimum free distance, the system designer can weigh these two factors in the 
selection of the code. 

Another way to increase the distance without increasing the constraint 
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Performance of rate i/2 binary convolutional 
codes with soft decision decoding. 
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length of the code is to repeat each output bit m times. This is equivalent 1o 
reducing the code rate by a factor of m or expanding the bandwidth by the 
same factor. The result is a convolutional code that has a minimum free 
distance of md lrec , where d /rei , is the minimum free distance of the original code 
without repetitions. Such a code is almost as good, from the viewpoint of 
minimum distance, as a maximum free distance, rate \/mn code. The error rate 
performance with repetitions is upper-bounded by 

1 * 

(14-6-14) 

k 4,«v 

where P 2 (md) is given by (14-6-12). Figure (14-6-5) illustrates the performance 
of the rate 1/2 codes with repetitions (m = 1, 2, 3,4) for constraint length 5. 


14-6-4 Use of Constant- Weight Codes and Concatenated • 

Codes for a Fading Channel 

Our treatment of coding for a Rayleigh fading channel to this point was based 
on the use of binary FSK as the modulation technique for transmitting each of 
the binary digits in a code word. For this modulation technique, all the 2* code 


FIGURE 14-6-5 Performance of rate l /2m, constraint 
length 5, binary convolutional codes 
with soft-decision decoding. 
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words in the ( n , k ) code have identical transmitted energy. Furthermore, under 
the condition that the fading on the n transmitted tones is mutually statistically 
independent and identically distributed, the average received signal energy for 
the M - 2* possible code words is also identical. Consequently, in a soft- 
decision decoder, the decision is made in favor of the code word having the 
largest decision variable. 

The condition that the received code words have identical average SNR has 
an important ramification in the implementation of the receiver. If the received 
code words do not have identical average SNR, the receiver must provide bias 
compensation for each received code word so as to render it equal energy. In 
general, the determination of the appropriate bias terms is difficult to 
implement because it requires the estimation of the average received signal 
power; hence, the equal-energy condition on the received code words 
considerably simplifies the receiver processing. 

There is an alternative modulation method for generating equal-energy 
waveforms from code words when the code is constant-weight, i.e., when every 
code word has the same number of Is. Note that such a code is nonlinear. 
Nevertheless, suppose we assign a single tone or cell to each bit position of the 
2* code words. Thus, an (n, k) binary block code has n tones assigned. 
Waveforms are constructed by transmitting the tone corresponding to a 
particular bit in a code word if that bit is a 1; otherwise, that tone is not 
transmitted for the duration of the interval. This modulation technique for 
transmitting the coded bits is called on-off keying (OOK). Since the code is 
constant-weight, say w, every coded waveform consists of w transmitted tones 
that depend on the positions of the Is in each of the code words. 

As in FSK, all tones in the OOK signal that are transmitted over the 
channel are assumed to fade independently across the frequency band and in 
time from one code word to another. The received signal envelope for each 
tone is described statistically by the Rayleigh distribution. Statistically inde- 
pendent additive white gaussian noise is assumed to be present in each 
frequency cell. 

The receiver employs maximum-likelihood (soft-decision) decoding to map 
the received waveform into one of the M possible transmitted code words. For 
this purpose, n matched filters are employed, each matched to one of the n 
frequency tones. For the assumed statistical independence of the signal fading 
for the n frequency cells and additive white gaussian noise, the envelopes of 
the matched filter outputs are squared and combined to form the M decision 
variables 

u = X \y ) \ 2 , 1 = 1.2 2* (14-6-15) 

7=1 

where |yj 2 corresponds to the squared envelope of the filter corresponding to 
the yth frequency, where j = 1. 2, . . . , n. 

It may appear that the constant-weight condition severely restricts our 
choice of codes. This is not the case, however. To illustrate this point, we 
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briefly describe some methods for constructing constant-weight codes. This 
discussion is by no means exhaustive. 

Method 1: Nonlinear Transformation of s Linear Code In general, if in 
each word of an arbitrary binary code we substitute one binary sequence for 
every occurrence of a 0 and another sequence for each 1, a constant-weight 
binary block code will be obtained if the two substitution sequences are of 
equal weights and lengths. If the length of the sequence is v and the original 
code is an ( n , k) code then the resulting constant-weight code will be an (vn, k) 
code. Hie weight will be n times the weight of the substitution sequence, and 
the minimum distance will be the minimum distances of the original code times 
the distances between the two substitution sequences. Thus, the use of 
complementary sequences when v is even results in a code with minimum 
distance W mjn and weight {vn. 

The simplest form of this method is the case v = 2, in which every 0 is 
replaced by the pair 01 and every 1 is replaced by the complementary sequence 
10 (or vice versa). As an example, we take as the initial code the (24, 12) 
extended Golay code. The parameters of the original and the resultant 
constant -weight code are given in Table 14-6-1. 

Note that this substitution process can be viewed as a separate encoding. 
This secondary encoding clearly does not alter the information content of a 
code word— it merely changes the form in which it is transmitted. Since the 
new code word is composed of pairs of bits — one “on” and one “off” — the use 
of OOK transmission of this code word produces a waveform that is identical 
to that obtained by binary FSK modulation for the underlying linear code. 

Method 2: Expurgation In this method, we start with an arbitrary binary 
block code and select from it a subset consisting of all words of a certain 
weight. Several different constant-weight codes can be obtained from one 
initial code by varying the choice of the weight w. Since the code words of the 
resulting expurgated code can be viewed as a subset of all possible permuta- 
tions of any one code word in the set, the term binary expurgated permutation 
modulation (BEXPERM) has been used by Gaarder (1971) to describe such a 
code. In fact , the constant-weight binary block codes constructed by the other 


TABLE 14-6-1 EXAMPLE OF CONSTANT-WEIGHT CODE FORMED BY 
METHOD 1 


Code parameters 

Original Golay 

Constant-weight 

n 

24 

48 

k 

12 

12 

M 

4096 

4096 

^ m in 

8 

!6 

W 

variable 

24 
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TABLE 14-6-2 EXAMPLES OF CONSTANT-WEIGHT CODES FORMED BY EXPURGATION 


Parameters 

Original 

Constant weight No. 1 

Constant weight No. 2 

n 

24 

24 

24 

k 

12 

9 

11 

M 

40% 

759 

2576 

"min 

8 

>8 

^8 

W 

variable 

8 

12 


methods may also be viewed as BEXPERM codes. This method of generating 
constant-weight codes is in a sense opposite to the first method in that the 
word length n is held constant and the code size M is changed. The minimum 
distance for the constant-weight subset will dearly be no less than that of the 
original code. As an example, we consider the Golay (24, 12) code and form 
the two different constant-weight codes shown in Table 14-6-2. 


Method 3: Hadamard Matrices This method might appear to form a 
constant-weight binary block code directly, but it actually is a special case of 
the method of expurgation. In this method, a Hadamard matrix is formed as 
described in Section 8-1-2, and a constant-weight code is created by selection 
of rows (code words) from this matrix. Recall that a Hadamard matrix is an 
nXrt matrix' (n even integer) of Is and Os with the property that any row 
differs from any other row in exactly \n positions. One row of the matrix is 
normally chosen as being all Os. 

In each of the other rows, half of the elements are Os and the other half Is. 
A Hadamard code of size 2(n — 1) code words is obtained by selecting these 
n - 1 rows and their complements. By selecting M - 2* 2(n - 1 ) of these 

code words, we obtain a Hadamard code, which we denote by H(n, k), where 
each code word conveys k information bits. The resulting code has constant 
weight \ n and minimum distance d min = \n. 

Since n frequency cells are used to transmit k information bits, the 
bandwidth expansion factor for the Hadamard H{n, k) code is defined as 




n 

k 


cells per information bit 


which is simply the reciprocal of the code rate. Also, the average signal-to- 
noise ratio (SNR) per bit, denoted by y b , is related to the average SNR per 
cell, y c , by the expression 


k 

yc= — y b 
2 n 

_ 2y b 

“ 2 - y b ~ 2 R c y b = — 

n ts c 


(14-6-16) 
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Let us compare the performance of the constant-weight Hadanaard codes 
under a fixed bandwidth constraint with a conventional M-ary orthogonal set 
of waveforms where each waveform has diversity L. The M orthogonal 
waveforms with diversity are equivalent to a block orthogonal code having a 
block length n = LM and k = log 2 M. For example, if M — 4 and L~ 2, the 
code words of the block orthogonal code are 

C, = [l 1 0 0 0 0 0 0] 

C 2 =j0 0 1 1 0 0 0 0] 

Cj = [0 0 0 0 1 1 0 0] 

C 4 = [0 0 0 0 0 0 1 1] 

To transmit these code words using OGK modulation requires n = 8 cells, and 
since each code word conveys k = 2 bits of information, the bandwidth 
expansion factor B e = 4. In general, we denote the block orthogonal code as 
0(n, k). The bandwidth expansion factor is 



LM 

k 


(14-6-17) 


Also, the SNR per bit is related to ihe SNR per cell by the expression 

k 

*“Z* 

= = (14-6-18) 

Now we turn our attention to the performance characteristics of these 
codes. First, the exact probability of a code word (symbol) error for Af-ary 
orthogonal signaling over a Rayleigh fading channel with diversity was given in 
closed form in Section 14-4. As previously indicated, this expression is rather 
cumbersome to evaluate, especially if either L or M or both are large. Instead, 
we shall use a union bound that is very convenient. That is, for a set of M 
orthogonal waveforms, the probability of a symbol error can be upper- 
bounded as 

P„*(M-1)P 2 (L) 

= (2* ~ 1)F i(L) < l k P 2 (L) (14-6-19) 

where P 2 (L), the probability of error for two orthogonal waveforms, each with 
diversity L, is given by (14-6-12) with p = 1/(2 + y c ). The probability of bit 
error is obtained by multiplying P u by 2*~'/(2* - 1), as explained previously. 
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FIGURE 14-6-6 


A simple upper (union) bound on the probability of a code word error for 
the Hadamard H(n, k) code is obtained by noting the probability of error in 
deciding between the transmitted code word and any other code word is 
bounded from above by P 2 {\d mjn ), where d min is the minimum distance of the 
code. Therefore, an upper bound on P M is 

l)P 2 (\d m , n )<2 k P 2 tid min ) (14-6-20) 

Thus the “effective order of diversity” of the code for OOK modulation is 
2 d mi „ • The bit error probability may be approximated as \P M , or slightly 
overbounded by multiplying P M by the factor 2*~'/(2* - 1), which is the factor 
used above for orthogonal codes. The latter was selected for the error 
probability computations given below. 

Figures 1 4-<j-6 and 14-6-7 illustrate the error rate performance of a selected 
number of Hadamard codes and block orthogonal codes, respectively, for 
several bandwidth expansion factors. The advantage resulting from an increase 
in the size M of the alphabet (or k, since k = Iog 2 M ) and an increase in the 
bandwidth expansion factor is apparent from observation of these curves. 
Note, for example, that the H( 20, 5) code when repeated twice results in a 
code that is denoted by 2 //(20, 5) and has a bandwidth expansion factor B r = 8. 
Figure 14-6-8 shows the performance of the two types of codes compared on 
the basis of equal bandwidth expansion factors. It is observed that the error 
rate curves for the Hadamard codes are steeper than the corresponding curves 



10 12 14 16 18 20 22 24 

Performance of Hadamard codes. SNR per bit .^(dB) 
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10 12 14 16 18 20 22 24 

FIGURE 14-6-7 Performance of block orthogonal codes, SNR perbit,y t (<iB) 



FIGURE 14-6-8 Comparison of performance between 
Hadamard codes and block orthogonal 
codes. 
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for the block orthogonal codes. This characteristic behavior is due simply to 
the fact that, for the same bandwidth expansion factor, the Hadamard codes 
provide more diversity than block orthogonal codes. Alternatively, one may 
say that Hadamard codes provide better bandwidth efficiency than block 
orthogonal codes. It should be mentioned, however, that at low SNR, a 
lower-diversity code outperforms a higher-diversity code as a consequence of 
the fact that, on a Rayleigh fading channel, there is an optimum distribution of 
the total received SNR among the diversity signals. Therefore, the curves for 
the block orthogonal codes will cross over the curves for the Hadamard codes 
at the low-SNR (high-error-rate) region. 


Method 4: Concatenation In this method, we begin with two codes: one 
binary and the other nonbinary. The binary code is the inner code and is an 
in, k ) constant-weight (nonlinear) block code. The nonbinary code, which may 
be linear, is the outer code. To distinguish it from the inner code, we use 
uppercase letters, e.g., an (N, K) code, where N and K are measured in terms 
of symbols from a q- ary alphabet. The size q of the alphabet over which the 
outer code is defined cannot be greater than the number of words in the inner 
code. The outer code, when defined in terms of the binary inner code words 
rather than q-axy symbols, is the new code. 

An important special case is obtained when q = 2 k and the inner code size is 
chosen to be 2*. Then the number of words is M = 2 kK and the concatenated 
structure is an ( nN , kK) code. The bandwidth expansion factor of this 
concatenated code is the product of the bandwidth expansions for the inner 
and outer codes. 

Now we shall demonstrate the performance advantages obtained on a 
Rayleigh fading channel by means of code concatenation. Specifically, we 
construct a concatenated code in which the outer code is a dual-Jc (nonbinary) 
convolutional code and the inner code is either a Hadamard code or a block 
orthogonal code. That is, we view the dual-A: code with M-ary (M = 2*) 
orthogonal signals for modulation as a concatenated code. In all cases to be 
considered, soft -decision demodulation and Viterbi decoding are assumed. 

The error rate performance of the dual-* convolutional codes is obtained 
from the derivation of the transfer function given by (8-2-39). For a rate-1/2, 
dual-fc code with no repetitions, the bit error probability, appropriate for the 
case in which each /c-bit output symbol from the dual-yt encoder is mapped into 
one of M = 2* orthogonal code words, is upper-bounded as 

*jk - 1 °c 

Pb < #TT 2 PM"*) (14-6-21) 

where P 2 (m) is given by (14-6-12). 

For example, a rate-1/2, dual-2 code may employ a 4-ary orthogonal code 
0(4, 2) as the inner code. The bandwidth expansion factor of the resulting 
concatenated code is, of course, the product of the bandwidth expansion 
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factors of the inner and outer codes. Thus, in this example, the rate of the 
outer code is 1/2 and the inner code is 1/2. Hence, B e = (4/2)(2) = 4. 

Note that if every symbol of the dual* is repeated r times, this is equivalent 
to using an orthogonal code with diversity L~r. If we select r = 2 in the 
example given above, the resulting orthogonal code is denoted as 0(8, 2) and 
the bandwidth expansion factor for the rate- 1/2, dual-2 code becomes B e ~S. 
Consequently, the term P 2 (m) in (14-6-21) must be replaced by P 2 (mL) when 
the orthogonal code has diversity L. Since a Hadamard code has an “effective 
diversity” 2 d min , it follows that when a Hadamard code is used as the inner 
code with a dual* outer code, the upper bound on the bit error probability of 
the resulting concatenated code given by (14-6-21) still applies if P 2 (m) is 
replaced by P^md^). With these modifications, the upper bound on the bit 
error probability given by (14-6-21) has been evaluated for rate-1/2, dual* 
convolutional codes with either Hadamard codes or block orthogonal codes as 
inner codes. Thus the resulting concatenated code has a bandwidth expansion 
factor equal to twice the bandwidth expansion factor of the inner code. 

First, we consider the performance gains due to code concatenation, Figure 
14-6-9 illustrates the performance of dual-/: codes with block orthogonal inner 
codes compared with the performance of block orthogonal codes for band- 
width expansion factors B e = 4, 8, 16, and 32. The performance gains due to 
concatenation are very impressive. For example, at an error rate of 10 6 and 
B e = 8, the dual-* code outperforms the orthogonal block code by 7.5 dB. In 


FIGURE 14 - 6-9 


Comparison of performance between block 
orthogonal codes and dual-k with block 
orthogonal inner codes. 
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FIGURE 14-6-10 


Comparison of performance between 
Hadamard codes and dual-1- codes with 
Hadamard inner codes. 



short, this gain may be attributed to the increased diversity (increase in 
minimum distance) obtained via code concatenation. Similarly, Fig. 14-6-10 
illustrates the performance of two dual-* codes with Hadamard inner codes 
compared with the performance of the Hadamard codes alone for B e = 8 and 
12. It is observed that the performance gains due to code concatenation are 
still significant, but certainly not as impressive as those illustrated in Fig. 
14-6-9. The reason is that the Hadamard codes alone yield a large diversity, so 
that the increased diversity arising from concatenation does not result in as 
large a gain in performance for the range of error rates covered in Fig. 14-6-10. 

Next, we compare the performance for the two types of inner codes used 
with dual -k outer codes. Figure 14-6-11 shows the comparison for B e ~ 8. Note 
‘that the 2 H(4, 2) inner code has d min = 4, and, hence, it has an effective order 
of diversity equal to 2. But this dual diversity is achieved by transmitting four 
frequencies per code word. On the other hand, the orthogonal code 0(8,2) 
also gives dual diversity, but this is achieved by transmitting only two 
frequencies per code word. Consequently, the 0(8, 2) code is 3 dB better than 
the 2 H(4, 2). This difference in performance is maintained when the two codes 
are used as inner codes in conjunction with dual-2 code. On the other hand, for 
B, = 8, one can use the H( 20, 5) as the inner code of a dual-5 code, and its 
performance is significantly better than that of the dual-2 code at low error * 
rates. This improvement in performance is achieved at the expense of an 
increase in decoding complexity. Similarly, in Fig. 14-6-12, we compare the 



824 DIGITAL COMMUNICATIONS 


FIGURE 14-6-11 


FIGURE 14-6-12 


Performance of dual-fc codes with either 
Hadamard or block orthogonal inner code 
for B r - 8. 



10 12 14 16 18 20 22 24 

SNR perbit.y^ldB) 


Performance of dual-6 codes with either 
Hadamard or block orthogonal inner code 
for B r = 1 6. 



SNR per bit,%[dB) 
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performance of the dual-& codes with two types of inner codes for B,. = 16. 
Note that the 3 //(8, 3) inner code has d mi „ = 12, and, hence, it yields an 
effective diversity of 6. This diversity is achieved by transmitting 12 frequencies 
per code word. The orthogonal inner code 0(24,3) gives only third-order 
diversity, which is achieved by transmitting three frequencies per code word. 
Consequently the 0(24, 3) inner code is more efficient at low SNR, that is, for 
the range of error rates covered in Fig. 14-6-12. At large SNR, the dual-3 code 
with the Hadamard ^H( 8, 3) inner code outperforms its counterpart with the 
0(24,3) inner code due to the large diversity provided by the Hadamard code. 
For the same bandwidth expansion factor B e — 16, one may use a dual-6 code 
with a H( 48, 6) code to achieve an improvement over the dual-3 code with the 
,//( 8,3) inner code. Again, this improvement in performance (which in this 
case is not as impressive as that shown in Fig. 14-6-11), must be weighed 
against the increased decoding complexity inherent in the dual-6 code. 

The numerical results given above illustrate the performance advantages in 
using codes with good distance properties and soft-decision decoding on a 
Rayleigh fading channel as an alternative to conventional M-ary orthogonal 
signaling with diversity. In addition, the results illustrate the benefits of code 
concatenation on such a channel, using a dual-A: convolutional code as the 
outer code and either a Hadamard code or a block orthogonal code as the 
inner code. Although dual-ic codes were used for the outer code, similar results 
are obtained when a Reed-Solomon code is used for the outer code. There is 
an even greater choice in the selection of the inner code. 

The important parameter in the selection of both the outer and the inner 
codes is the minimum distance of the resultant concatenated code required to 
achieve a specified level of performance. Since many codes will meet the 
performance requirements, the ultimate choice is made on the basis of 
decoding complexity and bandwidth requirements. 


14-6-5 System Design Based on the Cutoff Rate 

In the above treatment of coded waveforms, we have demonstrated the 
effectiveness of various codes for fading channels. In particular, we have 
observed the benefits of soft-decision decoding and code concatenation as a 
means for increasing the minimum distance and, hence, the amount of diversity 
in the coded waveforms. In this subsection, we consider randomly selected 
code words and derive an upper (union) bound on the error probability that 
depends on the cutoff rate parameter for the Rayleigh fading channel. 

Let us consider the model for the communication system illustrated in Fig. 
14-6-1. The modulator has a q-ary orthogonal FSK alphabet. Code words of 
block length n are mapped into waveforms by selecting n tones from the 
alphabet of q tones. The demodulation is performed by passing the signal 
through a bank of q matched filters followed by square-law detectors. The 
decoding is assumed to be soft-decision. Thus, the square-law detected outputs 


/ 
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from the demodulator are appropriately combined (added) with equal weight- 
ing to form M decision variables corresponding to the M possible transmitted 
code words. 

To evaluate the union bound on the probability of error in a Rayleigh 
fading channel with AWGN, we first evaluate the binary error probability 
involving the decision variable V x , which corresponds to the transmitted code 
word, and any of the other M - 1 decision variables corresponding to the other 
code words. Let U 2 be the other decision variable and suppose that U x and U 2 
have / tones in common. Hence, the contributions to U x and U 2 from these / 
tones are identical and, therefore, cancel out when we form the difference 
U-, — U 2 . Since the two decision variables differ in n — ! tones, the probability of 
error is simply that for a binary orthogonal FSK system with n - 1 order 
diversity. The exact form for this probability of error is given by (14-6-4), 
where p = 1/(2 + y f ), and y ( . is the average SNR per tone. For simplicity, we 
choose to use the Chernoff bound for this binary event error probability, given 
by (14-6-7), i.e., 

PiiU u U 2 | /)« [4p(l -p))"-' (14-6-22) 

Now, let us average over the ensemble of binary communication systems. 
There are q " possible code words, from which we randomly select two code 
words. Thus, each code word is selected with equal probability. Then, the 
probability that two randomly selected code words have l tones in common is 


'<nx)Hr 


(14-6-23) 


When we average (14-6-22) over the probability distribution of / given by 
, (14-6-23), we obtain 

W.. t/ 2 )= U 2 \l)P(D 

1=0 


{^[l+4(<7-l)p(l-p)|J 


(14-6-24) 


Finally, the union bound for communication systems that use M = 2 k 
randomly selected code words is simply 

1 )P 2 (U U U 2 ) < MP 2 {U], U 2 ) (14-6-25) 

By combining (14-6-24) with (14-6-25), we obtain the upper bound on the 
symbol error probability as 


P M < 


(14-6-26) 
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URE 14-6-13 


Cutoff rate as a function of % for 
Rayleigh fading channel. 



where R c - k/n is the code rate and R 0 is the cutoff rate defined as 


with 


R 0 = log 2 


9 

1 +4(q- l)p(l-p) 


(14-6-27) 


P = 


2 + y c 


(14-6-28) 


Graphs of R as a function of y c are shown in Fig. 14-6-13 for q = 2, 4, 
and 8. 

A more interesting form of (14-6-26) is obtained if we express P M in terms 
of the SNR per bit. In particular, (14-6-26) may be expressed as 

P M < (14-6-29) 

where, by definition, 


£(?. Jc) = 


Ro 

Yc 


= — log 2 

Yc L 


1 + M.q- l)p{l ~P) 


(14-6-30) 



828 DIGITAL COMMUNICATIONS 


FIGURE 14-6-14 


Graph of function g(q, y r ). 





Graphs of g{q, y c ) as a function of y c are plotted in Fig. 14-6-14, with q as a 
parameter. First, we note thate is an optimum y c for each value of q that 
minimizes the probability of error. For large q, this value is approximately 
y c = 3 (5 dB), which is consistent with our previous observation for ordinary 
square-law diversity combining. Furthermore, as q->*. the function g(q, y c ) 
approaches a limit, which is 


lim g(q, %.) = g„(>c) = t- log 2 
q—* m 7< 


(2 + y c ) 2 
.4(1 4- y c )_ 


(14-6-31) 


The value of g~(y c ) evaluated at y c = 3 is 

g*(3) = maxgx(y c ) 

y. 

= 0.215 (14-6-32) 


Therefore, the error probability in (14-6-29) for this optimum division of total 
SNR is 


p ^ - 4 . 65 ) 


(14-6-33) 
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This result indicates that the probability of error can be made arbitrarily small 
with optimum SNR per code chip, if the average SNR per bit y b > 4.65 
(6.7 dB). Even a relatively modest value of q - 20 comes close to this minimum 
value. As seen from Fig. 14-6-14, g(20, 3) = 0.2, so that P M ~* 0, provided 
y b >5 (7 dB). On the other hand, if q = 2, the maximum value of g(2, y c ) ~ 
0.096 and the corresponding minimum SNR per bit is 10.2 dB. 

In the case of binary FSK waveforms (q = 2), we may easily compare the 
cutoff rate for the unquantized (soft-decision) demodulator output with the 
cutoff rate for binary quantization, for which 

/? e = l-log[l + V4p(f-p)], 0 = 2 

as was given in (8-1-104). Figure 14-6-15 illustrates the graphs for R 0 and R Q . 
Note that the difference between R 0 and R Q is approximately 3 dB for rates 
below 0.3 and the difference increases rapidly at high rates. This loss may be 
reduced significantly by increasing the number of quantization levels to Q = 8 
(three bits). • 

Similar comparisons in the relative performance between unquantized 
soft-decision decoding and quantized decision decoding can also be made for 

q>2. 


FIGURE 14-6-15 
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Cutoff rate for (unquantized) soft- 
decision and hard-decision decoding of 
coded binary FSK. 
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14-6-6 Trellis-Coded Modulation 

Trellis-coded modulation was described in Section 8-3 as a means for achieving 
a coding gain on bandwidth-constrained channels, where we wish to transmit at 
a bit-rate-to-bandwidth ratio R/W > 1 For such channels, the digital com- 
munication system is designed to use bandwidth-efficient multilevel or multi- 
phase modulation (PAM, PSK, DPSK, or QAM), which allows us to achieve 
an R/W > l. When coding is applied in signal design for a bandwidth 
constrained channel, a coding gain is desired without expanding the signal 
bandwidth. This goal can be achieved, as described in Section 8-3, by 
increasing the number of signal points in the constellation over the corres- 
ponding uncoded system to compensate for the redundancy introduced by the 
code, and designing the trellis code so that the euclidean distance in a sequence 
of transmitted symbols corresponding to paths that merge at any node in the 
trellis is larger than the euclidean distance per symbol in an uncoded system. 

In contrast, the coding schemes that we have described above in conjunction 
with FSK modulation expand the bandwidth of the modulated signal for the 
purpose of achieving signal diversity. Coupled with FSK modulation, which is 
not bandwidth -efficient, the coding schemes we have described are inappropri- 
ate for use on bandwidth-constrained channels. 

In designing trellis-coded signal waveforms for fading channels, we may use 
the same basic principles that we have learned and applied in the design of 
conventional coding schemes. In particular, the most important objective in 
any coded signal design for fading channels is to achieve as large a signal 
diversity as possible. This implies that successive output symbols from the 
encoder must be interleaved or sufficiently separated in transmission, either in 
time or in frequency, so as to achieve independent fading in a sequence of 
symbols that equals or exceeds the minimum free distance of the trellis code. 
Therefore, we may represent such a trellis-coded modulation system by the 
block diagram in Fig. 14-6-16, where the interleaver is viewed broadly as a 
device that separates the successive coded symbols so as to provide indepen- 
dent fading on each symbol (through frequency or time separation of symbols) 
in the sequence. The receiver consists of a signal demodulator whose output is 
deinterleaved and fed to the trellis decoder. 


FIGURE 14-6-16 Block diagram of irellis-coded modulation systems. 
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As indicated above, the candidate modulation methods that achieve high 
bandwidth efficiency are M-ary PSK, DPSK, OAM and PAM. The choice 
depends to a large extent on the channel characteristics. If there are rapid 
amplitude variations in the received signal, QAM and PAM may be particu- 
larly vulnerable, because a wideband automatic gain control (AGC) must be 
used to compensate for the channel variations. In such a case, PSK or DPSK 
are more suitable, since the information is conveyed by the signal phase and 
not by the signal amplitude. DPSK provides the additional benefit that carrier 
phase coherence is required only over two successive symbols. However, there 
is an SNR degradation in DPSK relative to PSK. 

In the design of the trellis code, our objective is to achieve as large a free 
distance as possible, since this parameter is equivalent to the amount of 
diversity in the received signal. In conventional Ungerboeck trellis coding, 
each branch in the trellis corresponds to a single M-ary (PSK, DPSK, QAM) 
output channel symbol. Let us define the shortest error event path as the error 
event path with the smallest number of nonzero distances between itself and 
the correct path, and let L be its length. In other words, L is the Hamming 
distance between the M-ary symbols on the shortest error event path and those 
in the correct path. Hence, if we assume that the transmitted sequence 
corresponds to the all-zero path in the trellis, L is the number of branches in 
the short est-length path with a nonzero M- ary symbol. In a trellis diagram with 
parallel paths, the paths are constrained to have a shortest error event length 
of one branch, so that L = 1. This means that such a trellis code provides no 
diversity in a fading channel and, hence, the probability of error is inverselv 
proportional to the SNR per symbol. Therefore, in conventional trellis coding 
for a fading channel, it is undesirable to design a code that has parallel paths in 
its trellis, because such a code yields no diversity. This is the case in a 
conventional rate-m/(m + 1) trellis code, where we are forced to have parallel 
paths when the number of states is less than 2'". 

One possible way to increase the minimum free distance and, thus, the 
order of diversity in the code, is to introduce asymmetry in the signal point 
constellation. This approach appears to be somewhat effective, and has been 
investigated by Simon and Divsalar (1985), Divsalar and Yuen (1984), and 
Divsalar et al. (1987). 

A more effective way to increase the distance L and, thus, the order of 
diversity is to employ multiple trellis-coded modulation (MTCM). In MTCM. 
illustrated in Fig. 14-6-17, b input bits to the encoder are coded into c output 
bits, which are then subdivided into k groups, each of m bits, such that c = km. 
Each /M-bit group is mapped into an M-ary symbol. Thus, we obtain the M-ary 
output symbols. The special case k = 1 corresponds to the conventional 
Ungerboeck codes. With k M-ary output symbols, it is possible to design trellis 
codes with parallel paths having a distance L — k. Thus, we can achieve an 
error probability that decays inversely as (%/N 0 ) k . 

An important consideration in the design of the decoder for the trellis code 
is the use of any side information regarding the channel attenuation for each 
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FIGURE 14-6-17 Block diagram of MTCM transmitter. 


symbol. In the case of FSK modulation with square-law combination at the 
decoder to form the decision metrics, it is not necessary to know the channel 
attenuation for demodulated symbols. However, with coherent detection, the 
optimum euclidean distance metric for each demodulated symbol is of the form 
I r„ - a„s„| 2 , where a„ is the channel attenuation for the transmitted symbol s„ 
and r n is the demodulation output. Hence, the sum of branch metrics for any 
given path through the trellis is of the form 

O(r,S <0 ) = ^ Vn ~ 

n 

where the superscript (i) indicates the ith path through the trellis. Therefore, 
the estimation of the channel attenuation must be performed in order to 
realize the optimum trellis decoder. The estimation of the channel attenuation 
and phase shift, is considered in Appendix C for the case of PSK modulation 
and demodulation. The effect of the quality of the attenuation and phase 
estimates on the performance of PSK (uncoded) modulation is also assessed in 
Appendix C. 

14-7 BIBLIOGRAPHICAL NOTES AND REFERENCES 

In this chapter, we have considered a number of topics concerned with digital 
communications over a fading multipath channel. We began with a statistical 
characterization of the channel and then described the ramifications of the 
channel characteristics on the design of digital signals and on their perfor- 
mance. We observed that the reliability of the communication system is 
enhanced by the use of diversity transmission and reception. Finally we 
demonstrated that channel encoding and soft-decision decoding provide a 
bandwidth-efficient means for obtaining diversity over such channels. 

The pioneering work on the characterization of fading multipath channels 
and on signal and receiver design for reliable digital communications over such 
channels was done by Price (1954, 1956). This work was followed by additional 
significant contributions from Price and Green (1958, 1960), Kailath (1960, 
1961), and Green (1962). Diversity transmission and diversity combining 
techniques under a variety of channel conditions have been considered in the 
papers by Pierce (1958), Brennan (1959), Turin (1961, 1962), Pierce and Stein 
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(I960), Barrow (1963), Bello and Nelin (1962a, b, 1963), Price (1962a,. b), and 
Lindsey (1964). 

Our treatment of coding for fading channels has relied on contributions 
from a number of researchers. In particular, the use of dual-/: codes with 
M-ary orthogonal FSK was proposed in publications by Viterbi and Jacobs 
(1975) and Odenwalder (1976). The importance of coding for digital com- 
munications over a fading channel was also emphasized in a paper by Chase 
(1976). The benefits derived from concatenated coding with soft-decision 
decoding for a fading channel were demonstrated by Pieper et at. (1978). 
There, a Reed-Solomon code was used for the outer code and a Hadamard 
code was selected as the inner code. The performance of dual-A: codes with 
either block orthogonal codes or Hadamard codes as inner code were 
investigated by Proakis and Rahman (1979). The error rate performance of 
maximal free distance binary convolutional codes was evaluated by Rahman 
(1981). Finally, the derivation of the cutoff rate for Rayleigh fading channels is 
due to Wozencraft and Jacobs (1965). 

Trellis-coded modulation for fading channels has been investigated by many 
researchers, whose work was motivated to a large extent by applications to 
mobile and cellular communications. The book by Biglieri et at. (1991) gives a 
tutorial treatment of this topic and contains a large number of references to the 
technical literature. 

Our treatment of digital communications over fading channels focused 
primarily on the Rayleigh fading channel model. For the most part, this is due 
to the wide acceptance of this model for describing the fading effects on many 
radio channels and to its mathematical tractability. Although other statistical 
models, such as a Ricean fading model or the Nakagami fading model may be 
more appropriate for characterizing fading on some real channels, the general 
approach in the design of reliable communications presented in this chapter 
carries over. 


PROBLEMS 


14-1 The scattering function 5 (t;A) for a fading multipath channel is nonzero for the 
range of values 0=s r« 1 ms and -0.1 Hz « A « 0.1 Hz. Assume that the scattering 
function is approximately uniform in the two variables, 
a Give numerical values for the following parameters: 

(i) the multipath spread of the channel: 

(ii) the Doppler spread of the channel; 

(iii) the coherence time of the channel; 

(iv) the coherence bandwidth of the channel; 

(v) the spread factor of the channel. 

b Explain the meaning of the following, taking into consideration the answers 
given in (a): 

(i) the channel is frequency-nonselective; 

(ii) the channel is slowly fading; 

(iii) the channel is frequency-selective. 
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c Suppose that we have a frequency allocation (bandwidth) of 10 kHz and we wish 
to transmit at a rate of 100 bits/s over this channel. Design a binary 
communications system with frequency diversity. In particular, specify (i) the 
type of modulation, (ii) the number of subchannels, (iii) the frequency 
separation between adjacent carriers, and (iv) the signaling interval used in your 
design. Justify your choice of parameters. 

14-2 Consider a binary communications system for transmitting a binary sequence over 
a fading channel. The modulation is orthogonal FSK with third-order frequency 
diversity ( L = 3). The demodulator consists of matched filters followed by 
square-law detectors. Assume that the FSK carriers fade independently and 
identically according to a Rayleigh envelope distribution. The additive noises on 
the diversity signals are zero-mean gaussian with autocorrelation functions 
s£'[zf(r)z k (/ + r)| = N„S(t). The noise processes are mutually statistically 
independent. 

a The transmitted signal may be viewed as binary FSK with square-law detection, 
generated by a repetition code of the form 

Ih>C, = [1 1 1], 0->C„ = (0 o 0] 

Determine the error rate performance P 2I , for a hard-decision decoder following 
the square-law-detected signals, 
b Evaluate P zh for y c = 100 and 1000. 

c Evaluate the error rate P. , for y c = 100 and 1000 if the decoder employs 
soft-decision decoding. 

d Consider the generalization of the result in (a). If a repetition code of block 
length L (L odd) is used, determine the error probability P 2h of the 
hard-decision decoder and compare that with A,, the error rate of the 
soft-decision decoder. Assume y» 1. 

14-3 Suppose that the binary signal s,(t) is transmitted over a fading channel and the 
received signal is 

r,(t)= ±im,(r) + z(i). Ost/sf 

where g(r) is zero-mean white gaussian noise with autocorrelation function 

d> £J (r) = A 0 5(r) 

The energy in the transmitted signal is % = j So |.s-,(f)| 2 dt. The channel gain a is 
specified by the probability density function 

p{a) = (I. IS(ri) + 0.95(« -2) 

a Determine the average probability of error P 2 for the demodulator that employs 
a filter matched to ,v ; (r). 

b What value does P 2 approach as %/N Q approaches infinity, 
c Suppose that the same signal is transmitted on two statistically independently 
failing channels with gains u, and a 2 , where 

/?(«,)=<). I S(oJ + (>.WS(rr,-2), k =1.2 

The noises on the two channels are statistically independent and identically 
distributed. The demodulator employs a matched filter for each channel and 
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simply adds the two filter outputs to form the decision variable. Determine the 
average P 2 . 

d For the case in (c) what value does P 2 approach as %/N t} approaches infinity. 

14-4 A multipath fading channel has a multipath spread of T„, = 1 s and a Doppler 
spread B ,, =0.01 Hz. The total channel bandwidth at bandpass available for signal 
transmission is W =5 Hz. To reduce the effects of intersymbol interference, the 
signal designer selects a pulse duration T = 10 s. 
a Determine the coherence bandwidth and the coherence time, 
b Is the channel frequency selective? Explain, 
c Is the channel fading slowly or rapidly? Explain. 

d Suppose that the channel is used to transmit binary data via (antipodal) 
coherently detected PSK in a frequency diversity mode. Explain how you would 
use the available channel bandwidth to obtain frequency diversity and deter- 
mine how much diversity is available. 

e For the case in (d), what is the approximate SNR required per diversity to 
achieve an error probability of 10 ft ? 

f Suppose that a wideband signal is used for transmission and a RAKE-type 
receiver is used for demodulation. How many taps would you use in the RAKE 
receiver? 

g Explain whether or not the RAKE receiver can be implemented as a coherent 
receiver with maximal ratio combining. 

h If binary orthogonal signals are used for the wideband signal with square-law 
postdetection combining in the RAKE receiver, what is the approximate SNR 
required to achieve an error probability of 10 A ? ■(assume that all taps have the 
same SNR.) 

14-5 In the binary communications system shown in Fig. P14-5, c,(0 and z 7 (t) are 
statistically independent white gaussian noise processes with zero mean and 
identical autocorrelation functions d>..(r) = A„S(r). The sampled values U, and U : 
represent the real parts of the matched filter outputs. For example, if s,(t) is 
transmitted, then we have 

U , =2£+,V, 

U 2 = N x + N 2 


where % is the transmitted signal energy and 


ty=Re 



k = 1,2 



FIGURE P14-5 
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It is apparent that (/, and U 2 are correlated gaussian var&bles while /V, and N 2 are 
independent gaussian variables. Thus, 





where the variance of /V* is <r 2 = 2<C7V’ U . 

a Show that the joint probability density function for t/, and U 2 is 

p(U l ,U 2 )=^ 2 exp{-~[(U l -2%) 2 -U 2 (U l -2%)+' 2 Ul] 
if r(r) is transmitted and 

P(U " Ul) = i 3 CXP { ~ 2^ [(t/ ' + 7W)2 " U ^ U ' + 2r>+ ' iU A 


if -i(r) is transmitted. 

b Based on the likelihood ratio, show that the optimum combination of (/, and V 7 
results in the decision variable 


u = u t + p u 2 


where /3 is a constant. What is the optimum value of /3? 
c Suppose that s(t) is transmitted. What is the probability density function of U? 
d What is the probability of error assuming that sir) was transmitted? Express 
your answer as a function for the SNR %/N 0 . 
e What is the loss in performance if only U = U, is the decision variable? 

14-6 Consider the model for a binary communications system with diversity as shown in 
Fig. P14-6. The channels have fixed attenuations and phase shifts. The {**(/)} are 


FIGURE P14-6 



z L (I) 
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complex-valued white gaussian noise processes with zero mean and autocorrela- 
tion functions 

<M0 = + ?)] = W u *6(t) 

(Note that the spectral densities {N ak } are all different.) Also, the noise processes 
{4(0} are mutually statistically independent. The {J3*} are complex-valued 
weighting factors to be determined. The decision variable from the combiner is 

t/ = Re(2frf/*U° 

a Determine the pdf p(U) when + 1 is transmitted, 
b Determine the probability of error P 2 as a function of the weights {/?*}. 
c Determine the values of {/3 k ] that minimize P 2 . 

14-7 Determine the probability of error for binary orthogonal signaling with Z.th-order 
diversity over a Rayleigh fading channel. The pdfs of the two decision variables 
are given by (14-4-31) and (144-32). 

14-8 The rate -1/3, L— 3, binary convolutional code with transfer function given by 
(8-2-5) is used for transmitting data over a Rayleigh fading channel via binary 
PSK. 

a Determine and plot the probability of error for bard-decision decoding. Assume 
that the transmitted waveforms corresponding to the coded bits fade 
independently. 

h Determine and plot the probability of error for soft-decision decoding. Assume 
that the waveforms corresponding to the coded bits fade independently. 

14-9 A binary sequence is transmitted via binary antipodal signaling over a Rayleigh 
fading channel with Lth-order diversity. When s,(t) is transmitted, the received 
equivalent lowpass signals are 

r*(/) - a k e~ i4i s,(t) + z k (t), k = 1,2, .... L 

The fading among the L subchannels is statistically independent. The additive 
noise terms {z*(r)} are zero-mean, statistically independent and identically 
distributed white gaussian noise processes with autocorrelation function </>, ; (r) = 
A/oS(t). Each of the L signals is passed through a filter matched to s,(t) and the 
output is phase-corrected to yield 

U k = Re [e'** f r k (t)s,*(/) dt j, k = \,2,...,L 

The {U k } are combined by a linear combiner to form the decision variable 

u = t v„ 

A- 1 


a Determine the pdf of U conditional on fixed values for the {a*}, 
b Determine the expression for the probability of enor when the {a*} are 
statistically independent and identically distributed Rayleigh random variables. 
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14-10 The Chernoff bound for the probability of error for binary FSK with diversity L in 
Rayleigh fading was shown to be 

p 2 (L)<[4p(i- P )\‘ =[ 4 -^y' 

L (2 + y , ) J 

< 2 " ***<*■■* 


where 


g(Yr) = ~ log? 
T, 


(2 4- -y,.) 2 
.4(1 -t- y,.). 


a Plot g(y, ) and determine its approximate maximum value and the value of y, 
where the maximum occurs. 

b For a given y h , determine the optimal order of diversity, 
c Compare P 2 {L), under the condition that g(y c ) is maximized (optimal diversity), 
with the error probability for binary FSK in AWGN with no fading, which is 

P : = \e 

and determine the penalty in SNR due to fading and noncoherent (square-law) 
combining. 

14-11 A DS spread-spectrum system is used to resolve the multipath signal components 
in a two-path radio signal propagation scenario. If the path length of the secondary 
path is 300 m longer than that of the direct path, determine the minimum chip rate 
necessary to resolve the multipath components. 

14-12 A baseband digital communication system employs the signals shown in Fig. 
P14-12(a) for the transmission of two equiprobable messages. It is assumed that 
the communication problem studied here is a “one-shot” communication problem: 
that is, the above messages are transmitted just once and no transmission takes 
place afterward. The channel has no attenuation ( a = 1), and the noise is AWG 
with power spectral density \N„. 

a Find an appropriate orthonormal basis for the representation of the signals, 
b In a block diagram, give the- precise specifications of the optimum receiver using 
matched filters. Label the diagram carefully, 
c Find the error probability of the optimum receiver. 

d Show that the optimum receiver can be implemented by using just one fitter 


FIGURE P14-12 
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FIGURE P14-14 
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(see the block diagram in Fig. P14-12(5). What are the characteristics of the 
matched filter and the sampler and decision device? 

e Now assume that the channel is not ideal but has an impulse response of 
c(r) = 5(r) + |5{f - 17). Using the same matched filter as (d), design an 
optimum reciever. 

f Assuming that the channel impulse response is c(f) = 5(f) + aS(r — ^7), where a 
is a random variable uniformly distributed on [0, 1], and using the same matched 
filter as in (d), design the optimum receiver. 

14-13 A communication system employs dual antenna diversity and binary orthogonal 
FSK modulation. The received signals at the two antennas are 

r(t) = a,s(t) + n,(f) 
r 2 (r) = a 2 r(f)+n 2 (f) 

where a, and a 3 are statistically iid Rayleigh random variables, and n,(r) and n 2 (f) 
are statistically independent, zero-mean white gaussian random processes with 
power-spectral density The two signals are demodulated, squared and then 
combined (summed) prior to detection. 

a Sketch the functional block diagram of the entire receiver, including the 
demodulator, the combiner and the detector, 
b Plot the probability of error for the detector and compare the result with the 
case of no diversity. 

14-14 The two equivalent lowpass signals shown in Fig. P14-14 are used to transmit a 
binary sequence. The equivalent lowpass impulse response of the channel is 
h(t) — 45(f) — 25(f — 7). To avoid pulse overlap between successive transmissions, 
the transmission rate in bits/s is selected to be R =1/27. The transmitted signals 
are equally probable and are corrupted by additive zero-mean white gaussian 
noise having an equivalent lowpass representation z(f) with an autocorrelation 
function 

<M0 = i£U •(/)*(* + t)1 = S 0 S(r) 

a Sketch the two possible equivalent lowpass noise-free received waveforms, 
b Specify the optimum receiver and sketch the equivalent lowpass impulse 
responses of ail filters used in the optimum receiver. Assume coherent detection 
of the signals. 

14-15 Verify the relation in (14-3-14) by making the change of variable y = a 2 1S b /N 0 in 
the Nakagami-m distribution. 
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MULTIUSER 

COMMUNICATIONS 


Our treatment of communication systems up to this point has been focused on 
a single communication link involving a transmitter and a receiver. In this 
chapter, the focus shifts to multiple users and multiple communication links. 
We explore the various ways in which the multiple users access a common 
channel to transmit information. The multiple access methods that are 
described in this chapter form the basis for current and future wireline and 
wireless communication networks, such as satellite networks, cellular and 
mobile communication networks, and underwater acoustic networks. 


15-1 INTRODUCTION TO MULTIPLE ACCESS 
TECHNIQUES 


It is instructive to distinguish among several types of multiuser communication 
systems. One type is a multiple access system in which a large number of users 
share a common communication channel to transmit information to a receiver. 
Such a system is depicted in Fig. 15-1-1. The common channel may be the 
up-link in a satellite communication system, or a cable to which are connected 
a set of terminals that access a central computer, or some frequency band in 
the radio spectrum that is used by multiple users to communicate with a radio 
receiver. For example, in a mobile cellular communication system, the users 
are the mobile transmitters in any particular cell of the system and the receiver 
resides in the base station of the particular cell. 

A second type of multiuser communication system is a broadcast network in 
which a single transmitter sends information to multiple receivers as depicted 

840 
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FIGURE 15-1-1 A multiple access system. 


in Fig. 15-1-2. Examples of broadcast systems include the common radio and 
TV broadcast systems, as well as the down-links in a satellite system. 

The multiple access and broadcast networks are probably the most common 
multiuser communication systems. A third type of multiuser system is a 
store-and-forward network, as depicted in Fig. 15-1-3. Yet a fourth type is the 
two-way communication system shown in Fig. 15-1-4. 

In this chapter, we focus on multiple access methods for multiuser 
communications. In general, there are several different ways in which multiple 
users can send information through the communication channel to the receiver. 
One simple method is to subdivide the available channel bandwidth into a 
number, say V, of frequency nonoverlapping subchannels, as shown in Fig. 
15-1-5, and to assign a subchannel to each user upon request by the users. This 



FIGURE 15-1-2 A broadcast network. 
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FIGURE 15-1-3 


FIGURE 1 5-1-4 


FIGURE IS-I-5 




A two-way communication channel. 
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nonoverlapping frequency bands. 
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method is generally called frequency -division multiple access (FDMA), and is 
commonly used in wireline channels to accommodate multiple users for voice 
and data transmission. 

Another method for creating multiple subchannels for multiple access is to 
subdivide the duration 7}, called the frame duration, into. say. N 
nonoverlapping subintervals, each of duration T f !N. Then each user who 
wishes to transmit information is assigned to a particular time slot within each 
frame. This multiple access method is called time-division multiple access 
(TDMA) and it is frequently used in data and digital voice transmission. 

We observe that in FDMA and TDMA. the channel is basically partitioned 
into independent single-user subchannels. In this sense, the communication 
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system design methods that we have described for single-user communication 
are directly applicable and no new problems are encountered in a multiple 
access environment, except for the additional task of assigning users to 
available channels. 

The interesting problems arise when the data from the users accessing the 
network is bursty in nature. In other words, the information transmissions from 
a single user are separated by periods of no transmission, where these periods 
of silence may be greater than the periods of transmission. Such is the case 
generally with users at various terminals in a computer communications 
network that contains a central computer. To some extent, this is also the case 
in mobile cellular communication systems carrying digitized voice, since speech 
signals typically contain long pauses. 

In such an environment where the transmission from the various users is 
bursty and low-duty-cycle, FDMA and TDMA tend to be inefficient because a 
certain percentage of the available frequency slots or time slots assigned to 
users do not carry information. Ultimately, an inefficiently designed multiple 
access system limits the number of simultaneous users of the channel. 

An alternative to FDMA and TDMA is to allow more than one user to 
share a channel or subchannel by use of direct-sequence spread spectrum 
signals. In this method, each user is assigned a unique code sequence or 
signature sequence that allows the user to spread the information signal across 
the assigned frequency band. Thus signals from the various users are separated 
at the receiver by cross-correlation of the received signal with each of the 
possible user signature sequences. By designing these code sequences to have 
relatively small cross-correlations, the crosstalk inherent in the demodulation 
of the signals received from multiple transmitters is minimized. This multiple 
access method is called code-division multiple access (CDMA). 

In CDMA, the users access the channel in a random manner. Hence, the 
signal transmissions among the multiple users completely overlap both in time 
and in frequency. The demodulation and separation of these signals at the 
receiver is facilitated by the fact that each signal is spread in frequency by the 
pseudo-random code sequence. CDMA is sometimes called spread-spectrum 
multiple access (SSMA). 

An alternative to CDMA is nonspread random access. In such a case, when 
two users attempt to use the common channel simultaneously, their transmis- 
sions collide and interfere with each olher. When that happens, Ihe informa- 
tion is lost and must be retransmitted. To handle collisions, one must establish 
protocols for retransmission of messages that have collided. Protocols for 
scheduling the retransmission of collided messages are described below. 

15-2 CAPACITY OF MULTIPLE ACCESS METHODS 

It is interesting to compare FDMA, TDMA, and CDMA in terms of the 
information rate that each multiple access method achieves in an ideal AWGN 
channel of bandwidth W. Let us compare the capacity of K users, where each 
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FIGURE 15-2-1 


user has average power P t — P, for all 1 « K. Recall that in an ideal 

band-limited AWGN channel of bandwidth W, the capacity of a single user is 

where is the power spectral density of the additive noise. 

In FDMA, each user is allocated a bandwidth W/K. Hence, the capacity of 
each user is 

c ‘“f ,o 4 + (i^] (i5 ' 2 - 2 > 

and the total capacity for the K users is 

KC«-H'lo b (l+^) (15-2-3) 

Therefore, the total capacity is equivalent to that of a single user with average 
power P av = KP. 

It is interesting to note that for a fixed bandwidth W, the total capacity goes 
to infinity as the number of users increases linearly with K. On the other hand, 
as K increases, each user is allocated a smaller bandwidth (W//C) and, 
consequently, the capacity per user decreases. Figure 15-2-1 illustrates the 
capacity C K per user normalized by the channel bandwidth W, as a function of 
with K as a parameter. This expression is given as 

¥-!'°4 + 4©] (15-2-4) 

A more compact form of (15-2-4) is obtained by defining the normalized 


Normalized capacity as a function of 
Vty, for FDMA. 
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FIGURE 15-2-2 Total capacity per hertz as a function 
of V h /N 0 for FDMA. 



total capacity C„ = KC k /W, which is the total bit rate for all K users per unit 
of bandwidth. Thus, (15-2-4) may be expressed as 

C„=.o g! (l + C^) 

or, equivalently, 

%,_ 2 C "-1 

No C„ 

The graph of C„ versus % h /N 0 is shown in Fig. 15-2-2. We observe that C n 
increases as t b IN 0 increases above the minimum value of In 2. 

In a TDMA system, each user transmits for \jK of the time through the 
channel of bandwidth W, with average power KP. Therefore, the capacity per 
user is 

C«=(i)vP Io fc(l + ^) (15.2.7) 

which is identical to the capacity of an FDMA system. However, from a 
practical standpoint, we should emphasize that, in TDMA, it may not be 
possible for the transmitters to sustain a transmitter power of KP when K is 
very large. Hence, there is a practical limit beyond which the transmitter power 
cannot be increased as K is increased. 

In a CDMA system, each user transmits a pseudo-random signal of a 
bandwidth W and average power P. The capacity of the system depends on the 
level of cooperation among the K users. At one extreme is noncooperative 
CDMA, in which the receiver for each user signal does not know the spreading 
waveforms of the other users, or chooses to ignore them in the demodulation 
process. Hence, the other users signals appear as interference at the receiver of 
each user. In this case, the multiuser receiver consists of a bank of K 


(15-2-5) 

(15-2-6) 
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single-user receivers. If we assume that each user's pseudorandom signal 
waveform is gaussian then each user signal is corrupted by gaussian 
interference of power (K - 1)P and additive gaussian noise of power WN {) . 
Therefore, the capacity per user is 


C K = W log 2 


1 + ■ 


or, equivalently. 


Ck r. c K 
~W = ° g2 [ 1 + 


WN 0 + (K - 1)PJ 

%,/No 




(15-2-8) 


(15-2-9) 


W1 + (K-1){C k /W)%,/N 0 . 

Figure 15-2-3 illustrates the graph of C K /W versus % h !N 0> with AT as a 
parameter. 

For a large number of users, we may use the approximation In (1 + .r)«.x:. 
Hence, 

Ck Ck %b/Nu 


or, equivalently. 


IT W 1 + K(C K /W)(% b /N 0 ) 


C„ « log 2 e 


log 2 e 


( 15 - 2 - 10 ) 


%JN 0 

1 1 

< 


1 


In 2 VAfc ^ 2 


(15-2-11) 


In this case, we observe that the total capacity does not increase with K as in 
TDMA and FDMA. 

On the other hand, suppose that th.e K users cooperate by transmitting 
synchronously in time, and the multiuser receiver knows the spreading 
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waveforms of all users and jointly demodulates and detects all the users' 
signals. Thus, each user is assigned a rate R h 1 and a codebook 

containing a set of 2" H ' codewords of power P. In each signal interval, each 
user selects an arbitrary codeword, say X,, from its own codebook and all users 
transmit their codewords simultaneously. Thus, the decoder at the receiver 
observes 


Y = £ X, + z 


( 15 - 2 - 12 ) 


where Z is an additive noise vector. The optimum decoder looks for the K 
codewords, one from each codebook, that have a vector sum closest to the 
received vector Y in euclidean distance. 

The achievable /C -dimensional rate region for the K users in an AWGN 
channel, assuming equal power for each user, is given by the following 
equations: 

R < W log, 1 1 + ^ 7 )’ 


WN 0 J 
2 P 


1 ^ K 


R, + K,<lV'log : (l +^r). 1 


( 15 - 2 - 13 ) 


( 15 - 2 - 14 ) 




( 15 - 2 - 15 ) 


In the special case when all the rates are identical, the inequality (15-2-15) is 
dominant over the other K - 1 inequalities. It follows that if the rates 
{/?,, 1 « A'} for the K cooperative synchronous users are selected to fall in 

the capacity region specified by the inequalities given above then the 
probabilities of error for the K users tend to zero as the code block length n 
tends to infinity. 

From the above discussion, we conclude that the sum of the rates of the K 
users goes to infinity with K. Therefore, with cooperative synchronous users, 
the capacity of CDMA has a form similar to that of FDMA and TDMA. Note 
that if all the rates in the CDMA system are selected to be identical to R then 
(15-2-15) reduces to 


W ( KP > 

R < - log, 1 + 

K S2 l WN 0 J 


(15-2-16) 


which is identical to the rate constraint in FDMA and TDMA. In this case, 
CDMA does not yield a higher rate than TDMA and FDMA. However, if the 
rates of the K users are selected to be unequal such that the inequalities 
( 15-2-13)— (15-2-15) are satisfied then it is possible to find the points in the 
achievable rate region such that the sum of the rates for the K users in CDMA 
exceeds the capacity of FDMA and TDMA. 
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FIGURE 15-2-4 


Example 15-2-1 

Consider the case of two users in a CDMA system that employs coded 
signals as described above. The rates of the two users must satisfy the 
inequalities 

R t <W log, (l + 

( 2P N 

1 + iv^) 

where P is the average transmitted power of each user and W is the signal 
bandwidth. Let us determine the capacity region for the two-user CDMA 
system. 

The capacity region for the two-user CDMA system with coded signal 
waveforms has the form illustrated in Fig. 15-2-4, where 

c ' =w '°4' + w} '- ] - 2 

are the capacities corresponding to the two users with P i ~P 2 ~ P. We note 
that if user 1 is transmitting at capacity C,, user 2 can transmit up to a 
maximum rate 


to* (' + wn)~ C ' 



(15-2-17) 


which is illustrated in Fig. 15-2-4 as point A. This result has an interesting 


Capacity region of two-user CDMA multiple 
access gaussian channel. 


*2 
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interpretation. We note that rate R ^ corresponds to the case in which the 
signal from user 1 is considered as an equivalent additive noise in the 
detection of the signal of user 2. On the other hand, user 1 can transmit at 
capacity C , , since the receiver knows the transmitted signal from user 2 and, 
hence, it can eliminate its effect in detecting the signal of user 1. 

Due to symmetry, a similar situation exists if user 2 is transmitting at 
capacity C 2 . Then, user 1 can transmit up to a maximum rate R lm = R 2m , 
which is illustrated in Fig. 15.2.4 as point B. In this case, we have a similar 
interpretation as above, with an interchange in the roles of user 1 and user 
2 . 

The points A and B are connected by a straight line. It is easily seen that 
this straight line is the boundary of the achievable rate region, since any 
point on the line corresponds to the maximum rate W log 2 (1 + 2 P/WN 0 ), 
which can be obtained by simply time-sharing the channel between the two 
users. 

In the next section, we consider the problem of signal detection for a 
multiuser CDMA system and assess the performance and the computational 
complexity of several receiver structures. 


15-3 CODE-DIVISION MULTIPLE ACCESS 

As we have observed, TDMA and FDMA are multiple access methods in 
which the channel is partitioned into independent, single-user subchannels, i.e., 
nonoverlapping time slots or frequency bands, respectively. In CDMA, each 
user is assigned a distinct signature sequence (or waveform), which the user 
employs to modulate and spread the information-bearing signal. The signature 
sequences also allow the receiver to demodulate the message transmitted by 
multiple users of the channel, who transmit simultaneously and, generally, 
asynchronously. 

In this section, we treat the demodulation and detection of multiuser 
CDMA signals. We shall see that the optimum maximum-likelihood detector 
has a computational complexity that grows exponentially with the number of 
users. Such a high complexity serves as a motivation to devise suboptimum 
detectors having lower computational complexities. Finally, we consider the 
performance characteristics of the various detectors. 

15-3-1 CDMA Signal and Channel Models 

Let us consider a CDMA channel that is shared by K simultaneous users. Each 
user is assigned a signature waveform g k (t) of duration T, where T is the 
symbol interval. A signature waveform may be expressed as 

L-i 

gk(t)= 2 a k (n)p(t - nT c ), 0 « T 


(15-3-1) 
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where {<3*(n),0=£n - 1 } is a pseudo-noise (PN) code sequence consisting 

of L chips that take values {±1}, p(t) is a pulse of duration T c , and T c is the 
chip interval. Thus, we have L chips per symbol and T ~ LT C . Without loss of 
generality, we assume that all K signature waveforms have unit energy, i.e., 

f gl(t)dt= 1 (15-3-2) 

■'o 

The cross-correlations between pairs of signature waveforms play an 
important role in the metrics for the signal detector and on its performance. 
We define the following cross-correlations: 

P«(T) = f g,(f)g,(f- r)dt, i^j (15-3-3) 

■A) 

P„(r)= f gi{t)gj(t + T - x)dt, i*zj (15-3-4) 

J o 

For simplicity, we assume that binary antipodal signals are used to transmit 
the information from each user Hence, let the information sequence of the £th 
user be denoted by {b*(m)}, where the value of each information bit may be 
±1. It is convenient to consider the transmission of a block of bits of some 
arbitrary length, say N. Then, the data block from the klh user is 

b*=[MD b k (N)]' (15-3-5) 

and the corresponding equivalent lowpass, transmitted waveform may be 
expressed as 

**(') = ^ 2 b k {i)g k {t ~ iT ) (15-3-6) 

i= I 

where % k is the signal energy per bit. The composite transmitted signal for the 
K users may be expressed as 

*(') = E 

= 1^1 b k (i)g k (t - iT ~ T k ) (15-3-7) 

*=i i=i 

where {t*} are the transmission delays, which satisfy the condition 0 =£ x k < T 
for 1 ss k K. Without loss of generality, we assume that 
T k < T. This is the model for the multiuser transmitted signal in an asynchro- 
nous mode. In the special case of synchronous transmission, T k = 0 for 
1 « k K. The values of rof interest in the cross-correlations given by (15-3-3) 
and (15-3-4) may also be restricted to 0 ^ r < T, without loss of generality. 

The transmitted signal is assumed to be corrupted by AWGN, Hence, the 
received signal may be expressed as 


r(/) = s(t) + n(t) 


(15-3-8) 
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where ^(r) is given by (i 5-3-7) and n(f) is the noise, with power spectral density 

X. 


15-3-2 The Optimum Receiver 

The optimum receiver is defined as the receiver that selects the most probable 
sequence of bits {£>*(«). 1 =£ n N, 1 k AT} given the received signal r(t) 
observed over the time interval 0=s t =s NT + 2 T. First, let us consider the case 
of synchronous transmission; later, we shall consider asynchronous 
transmission. 


Synchronous Transmission In synchronous transmission, each (user) inter- 
ferer produces exactly one symbol which interferes with the desired symbol. In 
additive white gaussian noise, it is sufficient to consider the signal received in 
one signal interval, say T, and determine the optimum receiver. Hence, 

r{t) may be expressed as 

K 

r(t)= 2 VW k b k (l)g k (t) + «(/), O^t^T (15-3-9) 


The optimum maximum-likelihood receiver computes the log-likelihood 
function 



r(t) ~ 2 VW k b k (l) 8k (t) 


dt 


k= 1 


(15-3-10) 


and selects the information sequence {**(1), 1 =s k « K} that minimizes A(b). If 
we expand the integral in (15-3-10), we obtain 

A(b)=f r\l)dt-2 2 V^/>*(1)[ r(t)g k (t)dt 

+ £ £ V%W k b k (l)b,(l)f g k (t) gi (t)dt (15-3-11) 

i = 1 k = 1 Jo 


We observe that the integral involving r 2 (r) is common to all possible 
sequences {£>*(1)} and is of no relevance in determining which sequence was 
transmitted. Hence, it may be neglected. The term 


r k 





(15-3-12) 


represents the cross-correlation of the received signal with each of the K 
signature sequences. Instead of cross-correlators, we may employ matched 
filters. Finally, the integral involving g*(r) and g,(t) is simply 

Pik( 0)=[ g,(t)g k (t)dt (15-3-13) 

Therefore, (15-3-11) may be expressed in the form of correlation metrics 

C{t k> b*) = 2 2 V%fr*(l)r* - 2 2 ^(l^lJp^O) (15-3-14) 

* = 1 y-l k = 1 
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These correlation metrics may also be expressed in vector inner product form 
as 

C(t k , b*) = 2b'*r* - b^R>* (15-3-15) 

where 

r * ~ [ f i r 2 r K]'> = [V^6 1 (l) ... V%^b K (l)] 

and R, is the correlation matrix, with elements p jk { 0). It is observed that the 
optimum detector must have knowledge of the received signal energies in 
order to compute the correlation metrics. 

There are 2* possible choices of the bits in the information sequence of the 
K users. The optimum detector computes the correlation metrics for each 
sequence and selects the sequence that yields the largest correlation metric. 
We observe that the optimum detector has a complexity that grows exponen- 
tially with the number of users, K. 

In summary, the optimum receiver for symbol-synchronous transmission 
consists of a bank of K correlators or matched filters followed by a detector 
that computes the 2* correlation metrics given by (15-3-15) corresponding to 
the 2 K possible transmitted information sequences. Then, the detector selects 
the sequence corresponding to the largest correlation metric. 

Asynchronous Transmission In this case, there are exactly two consecutive 
symbols from each interferer that overlap a desired symbol. We assume that 
the receiver knows the received signal energies {$*} for the K users and the 
transmission delays {r A }. Clearly, these parameters must be measured at the 
receiver or provided to the receiver as side information by the users via some 
control channel. 

The optimum maximum-likelihood receiver computes the log-likelihood 
function 


J rST + 27 r K S -.2 

K 0 - X X b k(i)gk(t - iT - x k ) dt 

0 L A=1 1=1 J 

J r/Vr + 27 K A' rNT +2T 

r 2 (t)dt -2 5 V ^X MO r(t)g k (t - iT - x k ) dt 

0 k = 1 i = l Jo 

K K N N WVT + 27 

+ X X X X b k(i)b/U) gk(t- iT - x k )g,{t ~}T - x,) dt 

*=* i f= 1 i=l /« i Jo 

(15-3-16) 

where b represents the data sequences from the K users. The integral involving 
r 2 (t) may be ignored, since it is common to all possible information sequences. 
The integral 

/•(i + nr+r, 

'*(0 = 

J.r+r* 


r{t)g k (t ~ iT - x k ) dt, l^i^N (15-3-17) 
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represents the outputs of the correlator or matched filter for the Arth user in 
each of the signal intervals. Finally, the integral 

J l -NT + 2T 

g k (t - iT - r k )g,(t - jT - x,) dt 

0 

rNT + 2T- iT -x k 

= gk(t)gi(t + iT - jT + x k - x,) dt (15-3-18) 

J -iT-n 


may be easily decomposed into terms involving the cross-correlation p*/(t) = 
- r,) for k^l and p, k ( x) for k>l. Therefore, we observe that the 
log-likelihood function may be expressed in terms of a correlation metric that 
involves the outputs {r k (i), \ ^k^K,l^i^N} of K correlators or matched 
filters — one for each of the K signature sequences. Using vector notation, it 
can be shown that the NK correlator or matched filter outputs {r*(/)} can be 
expressed in the form 

r = R,vb + n (15-3-19) 


where, by definition 


r = K(l) f(2) ... i'(Af)]' 
r(0 = ['!(') r 2 (i) ... r K (f)Y 


b = [b'(l) b'(2) ... b '(N)J' 
b(/) = [Vf,&,( I ) V¥ 2 b 2 (i) ... VW K b K (i)]' 

■ n'(N)]‘ 


n = [n'(l) n'(2) ... 
n(/) = [«i(i) n 2 (i) ... n K (i)Y 

[R„(0) R'(l) 0 

R d ( 1) R fl (0) RL(1) 0 


0 

0 


0 0 0 R a (l) R a (0) RL(1) 

0 0 0 0 R a (l) R a (0) 


and R a (m) is a K x K matrix with elements 


(15-3-20) 

(15-3-21) 

(15-3-22) 


(15-3-23) 



gk(t ~ t k )g,{t + mT 


~ t/) dt 


(15-3-24) 


The gaussian noise vectors n(t) have zero mean and autocorrelation matrix 

£[n(A:)n'(y)] = ±N 0 R o (k - j) (15-3-25) 

Note that the vector r given by (15-3-19) constitutes a set of sufficient statistics 
for estimating the transmitted bits b k (i). 

If we adopt a block processing approach, the optimum ML detector must 
compute 2 NK correlation metrics and select the K sequences of length N that 
correspond to the largest correlation metric. Clearly, such an approach is 
much too complex computationally to be implemented in practice, especially 
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when K and N are large. An alternative approach is ML sequence estimation 
employing . the Viterbi algorithm. In order to construct a sequential-type 
detector, we make use of the fact that each transmitted symbol overlaps at 
most with 2K - 2 symbols. Thus, a significant reduction in computational 
complexity is obtained with respect to the block size parameter N, but the 
exponential dependence on K cannot be reduced. 

It is apparent that the optimum ML receiver employing the Viterbi 
algorithm involves such a high computational complexity that its use in practice 
is limited to communication systems where the number of users is extremely 
small, e.g., K < 10. For larger values of K, one should consider a sequential- 
type detector that is akin to either the sequential decoding or the stack 
algorithms described in Chapter 8. Below, we consider a number of sub- 
optimums detectors whose complexity grows linearly with K. 

15-3-3 Suboptimum Detectors 

In the above discussion, we observed that the optimum detector for the K 
CDMA users has a computational complexity, measured in the number of 
arithmetic operations (additions and muitiplications/divisions) per modulated 
symbol, that grows exponentially with K. In this subsection we describe 
suboptimum detectors with computational complexities that grow linearly with 
the number of users, K. We begin with the simplest suboptimum detector, 
which we call the conventional (single-user) detector. 

Conventional Single-User Defector In conventional single-user detection, 
the receiver for each user consists of a demodulator that correlates (or 
match-filters) the received signal with the signature sequence of the user and 
passes the correlator output to the detector, which makes a decision based on 
the single correlator output. Thus, the conventional detector neglects the 
presence of the other users of the channel or, equivalently, assumes that the 
aggregate noise plus interference is white and gaussian. 

Let us consider synchronous transmission. Then, the output of the cor- 
relator for the /cth user for the signal in the interval 0 t T is 

r k = 1 r{i)g k (t)dt (15-3-26) 

0 K 

= + 2 V£A(1)p,a(0) + «*(1) (15-3-27) 

/=> 

i*k 

where the noise component ^*(1) is given as 

«*(!)= f n(t)gdt)dt (15-3-28) 

A) 

Since n{t) is white gaussian noise with power spectral density kN„. the variance 
of n*(l) is r 

£[«*(!)] = 2^0 1 gi(t)dt = \N () 


(15-3-29) 
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Clearly, if the signature sequences are orthogonal, the interference from the 
other users given by the middle term in (15-3-27) vanishes and the conven- 
tional single-user detector is optimum. On the other hand, if one or more of 
the other signature sequences are not orthogonal to the user signature 
sequence, the interference from the other users can become excessive if the 
power levels of the signals (or the received signal energies) of one or more of 
the other users is sufficiently larger than the power level of the tfcth user. This 
situation is generally called the near -far problem in multiuser communications, 
and necessitates some type of power control for conventional detection. 

• In asynchronous transmission, the conventional detector is more vulnerable 
to interference from other users. This is because it is not possible to design 
signature sequences for any pair of users that are orthogonal for all time 
offsets. Consequently, interference from other users is unavoidable in asyn- 
chronous transmission with the conventional single-user detection. In such a 
case, the near-far problem resulting from unequal power in the signals trans- 
mitted by the various users is particularly serious. The practical solution 
generally requires a power adjustment method that is controlled by the 
receiver via a separate communication channel that all users are continuously 
monitoring. Another option is to employ one of the multiuser detectors 
described below. 

Decorrelating Detector We observe that the conventional detector has a 
complexity that grows linearly with the number of users, but its vulnerability to 
the near-far problem requires some type of power control. We shall now 
devise another type of detector that also has a linear computational complexity 
but does not exhibit the vulnerability to other-user interference. 

Let us first consider the case of symbol-synchronous transmission. In this 
case, the received signal vector r* that represents the output of the K matched 
filters is 

r* = R 5 b*+n A (15-3-30) 

where b*=[Vs^,(l) \%b 2 ( 1) ... \%b K { 1)]' and the noise vector with 
elements n* = [« t (l) « 2 (1) ... «*(!)]' has a covariance 


£(n*n'*) = R, (15-3-31) 

Since the noise is gaussian, r* is described by a /^-dimensional gaussian pdf 
with mean R,b* and covariance R s . That is, 

P{tK ^ ^ = V (2;r)* det R, exp “ R b *)' R “(T - R,b*)] (15-3-32) 

The best linear estimate of b* is the value of b* that minimizes the likelihood 
function 


'MM = (r* - R,b A -)'R;'(r* - R,b A ) 


(15-3-33) 
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FIGURE 15-3-1 Receiver structure for decorrelation receiver. 


The result of this minimization yields 

b°* = R;V (15-3-34) 

Then, the detected symbols are obtained by taking the sign of each element of 
b° K , i.e. 

b*=sgn(l&) (15-3-35) 

Figure 15-3-1 illustrates the receiver structure. Note from (15-3-34) and 
(15-3-35) that the decorrelator requires knowledge of the relative delays, in 
general, to form R,; no knowledge of the signal amplitudes is required. 

Since the estimate b* is obtained by performing a linear transformation on 
the vector of correlator outputs, the computational complexity is linear in K. 

The reader should observe that the best (maximum-likelihood) linear 
estimate of b* given by (15-3-34) is different from the optimum nonlinear ML 
sequence detector that finds the best discrete-valued {±1} sequence that 
maximizes the likelihood function. It is also interesting to note that the 
estimate b* is the best linear estimate that maximizes the correlation metric 
given by (15-3-15). 

An interesting interpretation of the detector that computes b° K as in 
(15-3-34) and makes decisions according to (15-3-35) is obtained by considering 
the case of K - 2 users. In this case. 



(15-3-36) 

(15-3-37) 
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where 

P= f gi(t)g&)dt 

Then, if we correlate the received signal 

r(r) = + VW 2 b 2 g 2 (t) + «(0 

with g,(r) and g 2 (t). we obtain 

_ rVgjb, +pVW 2 b 2 +n t - 
lp\%b l + V¥ 2 b 2 + n 2 . 


(15-3-38) 


(15-3-39) 


(15-3-40) 


where /t, and n 2 are the noise components at the output of the correlators. 
Therefore, 

b: = R 'r, 

= ‘ Vfi*, + (/],- pn 2 )/{ 1 - p 2 y 

.VW 2 b 2 + ( ti 2 — pn,)/(] - p 2 )_ 

This is a very interesting result, because the transformation R“ 1 has eliminated 
the interference components between the two users. Consequently, the 
near-far problem is eliminated and there is no need for power control. 

It is interesting to note that a result similar to (15-3-41) is obtained if we 
correlate r{t) given by (15-3-39) with the two modified signature waveforms 

g[(t) = g l (t)~ Pg2(t) (15-3-42) 

*K') = *2(0-pg,(0 (15-3-43) 

This means that, by correlating the received signal with the modified signature 
waveforms, * we have tuned out or decorrelated the multiuser interference. 
Hence, the detector based on (15-3-34) is called a decorrelating detector. 

In asynchronous transmission, the received signal at the output of the 
correlators is given by (15-3-19). Hence, the log-likelihood function is given as 

A(b) = (r - R„b)'R„‘(r - R, v b) (15-3-44) 

where R v is defined by (15-3-23) and b is given by (15-3-21). It is relatively 
easy to show that the vector b that minimizes A(b) is 

b° = R.v’r (15-3-45) 

This is the ML estimate of b and it is again obtained by performing a linear 
transformation of the outputs from the bank of correlators of matched filters. 
Since i- R v b + n, it follows from (15-3-45) that 

b^b + R^'n (15-3-46) 

Therefore, b" is an unbiased estimate of b. This means that the multiuser ' 


(15-3-41) 
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interference has been eliminated, as in the case of symbol-synchronous 
transmission. Hence, this detector for asynchronous transmission is also called 
a decorreiating detector. 

A computationally efficient method for obtaining the solution given by 
(15-3-45) is the square-root factorization method described in Appendix D. Of 
course, there are many other methods that may be used to invert the matrix 
R/v. Iterative methods to decorrelate the signals have also been explored. 

Minimum Mean-Square-Error Detector In the above discussion, we 
showed that the linear ML estimate of b is obtained by minimizing the 
quadratic log-likelihood function in (15-3-44). Thus, we obtained the result 
given by (15-3-45), which is an estimate derived by performing a linear 
transformation on the outputs of the bank of correlators or matched filters. 

Another, somewhat different, solution is obtained if we seek the linear 
transformation b° = Ar, where the matrix A is to be determined so as to 
minimize the mean square error (MSE) 

/(b) = E[(b-b°)'(b-b 0 )] 

= E[(b - Ar)'(b - Ar)] (15-3-47) 

It is easily shown that the optimum choice of A that minimizes /(b) is 

A° = (R n + ^/V 0 I)- 1 (15-3-48) 

and, hence, 

b" = (Ryv + iJVoI) ‘r (15-3-49) 

The output of the detector is then b= sgn (b°). 

The estimate given by (15-3-49) is called the minimum MSE (MMSE) 
estimate of b. Note that when ^jV 0 is small compared with the diagonal 
elements of R v , the MMSE solution approaches the ML solution given by 
(15-3-45). On the other hand, when the noise level is large compared with the 
signal level in the diagonal elements of R*. A 0 approaches the identity matrix 
(scaled by 2 -%). In this low-SNR case, the detector basically ignores the 
interference from other users, because the additive noise is the dominant term. 
It should also be noted that the MMSE criterion produces a biased estimate of 
b. Hence, there is some residual multiuser interference. 

To perform the computations that lead to the values of b, we solve the set of 
linear equations 

(Rjv + 2 NoI)b = r (15-3-50) 

This solution may be computed efficiently using a square-root factorization of 
the matrix + ^Vol 3S indicated above. Thus, to detect NK bits requires 
3NK 2 multiplications. Therefore, the computational complexity is 3 K 
multiplications per bit, which is independent of the block length N and is linear 
in K. 
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Other Types of Detectors The decorrelating detector and the MMSE 
detector described above involve performing linear transformations on a block 
of data from a bank of K correlators or matched filters. The MMSE detector is 
akin to the linear MSE equalizer described in Chapter 10. Consequently, 
MMSE multiuser detection can be implemented by employing a tapped-delay- 
line filter with adjustable coefficients for each user and selecting the filter 
coefficients to minimize the MSE for each user signal. Thus, the received 
information bits are estimated sequentially with finite delay, instead of as a 
block. 

The estimate b l> given by (15-3-46), which is obtained by processing a block 
of N bits by a decorrelating detector, can also be computed sequentially. Xie et 
al. (1990) have demonstrated that the transmitted bits may be recovered 
sequentially from the received signal, by employing a form of a decision- 
feedback equalizer with finite delay. Thus, there is a similarity between the 
detection of signals corrupted by IS1 in a single-user communication system 
and the detection of signals in a multiuser system with asynchronous 
transmission. 


15-3-4 Performance Characteristics of Detectors 

The bit error probability is generally the desirable performance measure in 
multiuser communications. In evaluating the effect of multiuser interference on 
the performance of the detector for a single user, we may use as a benchmark 
the probability of a bit error for a single-user receiver in the absence of other 
users of the channel, which is 

P k (y k ) = Q{^2y k ) (15-3-51) 

where y k = % k lN 0 , % is the signal energy per bit and j/V () is the power spectra! 
density of the AWGN. 

In the case of the optimum detector for either synchronous or asynchronous 
transmission, the probability of error is extremely difficult and tedious to 
evaluate. In this case, we may use (15-3-51) as a lower bound and the 
performance of a suboptimum detector as an upper bound. 

Let us consider, first, the suboptimum, conventional single-user detector. 
For synchronous transmission, the output of the correlator for the Arth user is 
given by (15-3-27). Therefore, the probability of error for the £th user, 
conditional on a sequence b, of bits from other users, is 

P k (b /) = {?( ^Wl)p,*(0)] /(V 0 ) (15-3-52) 

j*k 

Then, the average probability of error is simply 

£ P*(b,) 

t*k 


(15-3-53) 
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The probability in (15-3-53) will be dominated by the term that has the 
smallest argument in the Q function. The smallest argument will result in an 
SNR of 


(SNR) min = 


-V 0 


Therefore, 


v^-SV^mo )! 12 

;=i 


(15-3-54) 


G)*- ! 0(V2(SjV/?) min ) < P k < - l)G(V2(SjV/?) mm ) (15-3-55) 


A similar development can be used to obtain bounds on the performance for 
asynchronous transmission. 

In the case of a decorrelating detector, the other-user interference is 
completely eliminated. Hence, the probability of error may be expressed as 

P k = QW^l) (15-3-56) 

whete <rl is the variance of the noise in the £th element of the estimate b°. 


Example 15-3-1 

Consider the case of synchronous, two-user transmission, where is given 
by (15-3-41). Let us determine the probability of error. 

The signal component for the first term in (15-3-41) is The noise 
component is 

n i - pn 2 

n = z~ 

1 -P 2 

where p is the correlation between the two signature signals. The variance 
of this noise is 

2 £[(»i ~ P" i )] 2 

' (1 - P 2 ) 1 

1 N 0 
1-P 2 2 

and 

mV f^>) 

A similar result is obtained for the performance of the second user. 
Therefore, the noise variance has increased by the factor (l-p 2 )" ! . This 
noise enhancement is the price paid for the elimination of the multiuser 
interference by the decorrelation detector. 

The error rate performance of the MMSE detector is similar to that for the 
decorrelation detector when the noise level is low. For example, from 


(15-3-57) 

(15-3-58) 
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(15-3-49), we observe that when N 0 is small relative to the diagonal elements of 
the signal correlation matrix R N , 

b°«R- J r (15-3-59) 

which is the solution for the decorrelation detector. For low multiuser 
interference, the MMSE detector results in a smaller noise enhancement 
compared with the decorrelation detector, but has some residual bias resulting 
from the other users. Thus, the MMSE detector attempts to strike a balance 
between the residual interference and the noise enhancement. 

An alternative to the error probability as a figure of merit that has been 
used to characterize the performance of a multiuser communication system is 
the ratio of SNRs with and without the presence of interference, fn particular, 
(15-3-51) gives the error probability of the kth user in the absence of 
other-user interference. In this case, the SNR is y k - % k /N 0 . In the presence of 
multiuser interference, the user that transmits a signal with energy % k will have 
an error probability P, that exceeds P k (y k ). The effective SNR y k , is defined as 
the SNR required to achieve the error probability 

p* = Pk(y kr ) = (15-3-60) 

The efficiency is defined as the ratio y kr /y k and represents the performance 
loss due to the multiuser interference. The desirable figure of merit is the 
asymptotic efficiency, defined as 

77* = lim — (15-3-61) 

y k 

This figure of merit is often simpler to compute than the probability of error. 


Example 15-3-2 


Consider the case of two symbol-synchronous users with signal energies 
and iC. Let us determine the asymptotic efficiency of the conventional 
detector. 

In this case, the probability of error is easily obtained from (15-3-52) and 
(15-3-53) as 


p \ = 20( V 2(V^ + p\%) 2 !N a ) + J<2(V2i [VW t ~ pV%,) 2 /N 0 ) 

However, the asymptotic efficiency is much easier to compute. It follows 
from the definition (15-3-61) and from (15-3-52) that 


V\ = 


max 




A similar expression is obtained for -q 2 - 


The asymptotic efficiency of the optimum and suboptimum detectors that 
we have described has been evaluated by Verdu (1986), Lupas and Verdu 
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FIGURE 15-3-2 Asymptotic efficiencies of optimum (Viterbi) detector, conventional detector, MMSE detector, 
and linear ML detector in a two-user synchronous DS/SSMA system. [From Xie et al. (1900), 
©IEEE.) 


(1989), and Xie et al. (1990). Figure 15-3-2 illustrates the asymptotic efficiencies 
of these detectors when K = 2 users are transmitting synchronously. These 
graphs show that when the interference is small (^ 2 ~»0)r the asymptotic 
efficiencies of these detectors are relatively large (near unity) and comparable. 
As ^ increases, the asymptotic efficiency of the conventional detector 
deteriorates rapidly. However, the other linear detectors perform relatively 
well compared with the optimum detector. Similar conclusions are reached by 
computing the error probabilities, but these computations are often more 
tedious. 

15-4 RANDOM ACCESS METHODS 

In this section, we consider a multiuser communication system in which users 
transmit information in packets over a common channel. In contrast to the 
CDMA method described in Section 15-3, the information signals of the users 
are not spread in frequency. As a consequence, simultaneous transmission of 
signals from multiple users cannot be separated at the receiver. The access 
methods described below are basically random, because packets are generated 
according to some statistical model. Users access the channel when they have 
one or more packets to transmit. When more than one user attempts to 
transmit packets simultaneously, the packets overlap in time, i.e., they collide, 
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FIGURE 15-4-1 Random access packet transmission: 

(a) packets from a typical user; 

(b) packets from several users. (ft) Overlap 

and, hence, a conflict results, which must be resolved by devising some channel 
protocol for retransmission of the packets. Below, we describe several random 
access channel protocols that resolve conflicts in packet transmission. 

15-4-1 ALOHA Systems and Protocols 

Suppose that a random access scheme is employed where each user transmits a 
packet as soon as it is generated. When a packet is transmitted by a user and 
no other user transmits a packet for the duration of the time interval then the 
packet is considered successfully transmitted. However, if one or more of the 
other users transmits a packet that overlaps in time with the packet from the 
first user, a collision occurs and the transmission is unsuccessful. Figure 15-4-1 
illustrates this scenario. If the users know when their packets are transmitted 
successfully and when they have collided with other packets, it is possible to 
devise a scheme, which we may call a channel access protocol, for retransmis- 
sion of collided packets. 

Feedback to the users regarding the successful or unsuccessful transmission 
of packets is necessary and can be provided in a number of ways. In a radio 
broadcast system, such as one that employs a satellite relay as depicted in Fig. 
15-4-2, the packets are broadcast to all the users on the down-link. Hence, all 




FIGURE 15-4-2 Broadcast system. 
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the transmitters can monitor their transmissions and, thus, obtain the following 
ternary information: no packet was transmitted, or a packet was transmitted 
successfully, or a collision occurred. This type of feedback to the transmitters is 
generally denoted as (0,1, c) feedback. In systems that employ wireline or 
filter-optic channels, the receiver may transmit the feedback signal on a 
separate channel. 

The ALOHA system devised by Abramson (1973, 1977) and others at the 
University of Hawaii employs a satellite repeater that broadcasts the packets 
received from the various users who access the satellite. In this case, all the 
users can monitor the satellite transmissions and, thus, establish whether or not 
their packets have been transmitted successfully. 

There are basically two types of ALOHA systems: synchronized or slotted 
and unsynchronized or unslotted. In an unslotted ALOHA system, a user may 
begin transmitting a packet at any arbitrary time. In a slotted ALOHA, the 
packets are transmitted in time slots that have specified beginning and ending 
times. 

We assume that the start time of packets that are transmitted is a Poisson 
point process having an average rate of A packets/s. Let T p denote the time 
duration of a packet. Then, the normalized channel traffic G, also called the 
offered channel traffic, is defined as 


G = \T P (154-1) 

There are many channel access protocols that can be used to handle 
collisions. Let us consider the one due to Abramson (1973). In Abramson’s 
protocol, packets that have collided are retransmitted with some delay r, 
where r is randomly selected according to the pdf 

p(r) = ae az (15-4-2) 

where a is a design parameter. The random delay r is added to the time of the 
initial transmission and the packet is retransmitted at the new time. If a 
collision occurs again, a new value of r is randomly selected and the packet is 
retransmitted with a new delay from the time of the second transmission. This 
process is continued until the packet is transmitted successfully. The design 
parameter a determines the average delay between retransmissions. The 
smaller the value of a, the longer the delay between retransmissions. 

Now, let A', where A‘<A, be the rate at which packets are transmitted 
successfully. Then, the normalized channel throughput is 

S = VT P (154-3) 

We can relate the channel throughput S to the offered channel traffic G by 
making use of the assumed start time distribution. The probability that a 
packet will not overlap a given packet is simply the probability that no packet 
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FIGURE 15-4-3 Throughput in ALOHA systems. 


begins T p s before or T p s after the start time of the transmitted packet. Since 
the start time of all packets is Poisson-distributed, the probability that a packet 
will not overlap is exp ( -2\T P ) = exp (-2 G). Therefore, 

S = Ge 2G (15-4-4) 

This relationship is plotted in Fig. 15-4-3. We observe that the maximum 
throughput is S max = 1/Ze = 0.184 packets per slot, which occurs at G = 
When G > 2. the throughput S decreases. The above development illustrates 
that an unsynchronized or unslotted random access method has a relatively 
small throughput and is inefficient. 

Throughput for slotted ALOHA To determine the throughput in a 
slotted ALOHA system, let G, be the probability that the rth user will transmit 
a packet in some slot. If all the K users operate independently and there is no 
statistical dependence between the transmission of the user's packet in the 
current slot and the transmission of (fie user’s packet in previous time slots, the 
total (normalized) offered channel traffic is 

K 

G = Sc, (15-4-5) 

1=1 

Note that, in this case, G may be greater than unity. 

Now, let S, s G, be the probability that a packet transmitted in a time slot is 
received without a collision. Then, the normalized channel throughput is 

K 

5 = ES, 

1=1 


(15-4-6) 
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The probability that a packet from the ;th user will not have a collision with 
another packet is 

a = fl(l-C,> (15-4-7) 

/ i 

i - ' 

Therefore. 

St = C’.Q, (15-4-8) 

A simple expression for the channel throughput is obtained by considering 
K identical users. Then, 


and 

S = G ( 1 “£) (13 ‘ 4 ' y) 

Then, if we let K —> ~s- , we obtain the throughput 

S = Ge ‘ ' (15-4-10) 

This result is also plotted in Fig. 15-4-3. We observe that S reaches a maximum 
throughput of S m . lx = \/e = 0.368 packets per slot at G = 1, which is twice the 
throughput of the unslotted ALOHA system. 

The performance of the slotted ALOHA system given above is based on 
Abramson’s protocol for handling collisions. A higher throughput is possible 
by devising a better protocol. 

A basic weakness in Abramson's protocol is that it does not take into 
account the information on the amount of traffic on the channel that is 
available from observation of the collisions that occur. An improvement in 
throughput of the slotted ALOHA system can be obtained by using a tree-type 
protocol devised by Capetanakis (1979). In this algorithm, users are not 
allowed to transmit new packets that are generated until all earlier collisions 
are resolved. A user can transmit a new packet in a time slot immediately 
following its generation, provided that all previous packets that have collided 
have been transmitted successfully. If a new packet is generated while the 
channel is clearing the previous collisions, the packet is stored in a buffer. 
When a new packet collides with another, each user assigns its respective 
packet to one of two sets, say A or B, with equal probability (by flipping a 
coin). Then, if a packet is put in set A, the user transmits it in the next time 
slot. If it collides again, the user will again randomly assign the packet to one 
of two sets and the process of transmission is repeated. This process continues 
until all packets contained in set A are transmitted successfully. Then, all 
packets in set B are transmitted following the same procedure. All the users 
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monitor the state of the channel, and, hence, they know when ail the collisions 
have been serviced. 

When the channel becomes available for transmission of new packets, the 
earliest generated packets are transmitted first. To establish a queue, the time 
scale is subdivided into subintervals of sufficiently short duration such that, on 
average, approximately one packet is generated by a user in a subinterval. 
Thus, each packet has a “time tag" that is associated with the subinterval in 
which it was generated. Then, a new packet belonging to the first subinterval is 
transmitted in the first available time slot. If there is no collision then a packet 
from the second subinterval is transmitted, and so on. This procedure 
continues as new packets are generated and as long as any backlog of packets 
for transmission exists. Capetanakis has demonstrated that this channel access 
protocol achieves a maximum throughput of 0.43 packets per slot. 

In addition to throughput, another important performance measure in a 
random access system is the average transmission delay in transmitting a 
packet. In an ALOHA system, the average number of transmissions per packet 
is G/S. To this number we may add the average waiting time between 
transmissions and, thus, obtain an average delay for a successful transmission. 
We recall from the above discussion that in the Abramson protocol, the 
parameter a determines the average delay between retransmissions. If we 
select a small, we obtain the desirable effect of smoothing out the channel load 
at times of peak loading, but the result is a long retransmission delay. This is 
the trade-off in the selection of a in (15-4-2). On the other hand, the 
Capetanakis protocol has been shown to have a smaller average delay in the 
transmission of packets. Hence, it outperforms Abramson’s protocol in both 
average delay and throughput. 

Another important issue in the design of random access protocols is the 
stability of the protocol. In our treatment of ALOHA-type channel access 
protocols, we implicitly assumed that for a given offered load, an equilibrium 
point is reached where the average number of packets entering the channel is 
equal to the average number of packets transmitted successfully. In fact, it can 
be demonstrated that any channel access protocol, such as the Abramson 
protocol, that does not take into account the number of previous unsuccessful 
transmissions in establishing a retransmission policy is inherently unstable. On 
the other hand, the Capetanakis algorithm differs from the Abramson protocol 
in this respect and has been proved to be stable. A thorough discussion of the 
stability issues of random access protocols is found in the paper by Massev 
(1988). 


15-4-2 Carrier Sense Systems and Protocols 

As we have observed, ALOHA-type (slotted and unslotted) random-access 
protocols yield relatively low throughput. Furthermore, a slotted ALOHA 
system requires that users transmit at synchronized time slots. In channels 
where transmission delays are relatively small, it is possible to design random 
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FIGURE 15-4-4 



(propagation delay) 


Local area network with bus architecture. 


access protocols that yield higher throughput. An example of such a protocol is 
carrier sensing with collision detection, which is used as a standard Ethernet 
protocol in local area networks. This protocol is generally known as carrier 
sense multiple access with collision detection (CSMA/CD). 

The CSMA/CD protocol is simple. All users listen for transmissions on the 
channel. A user who wishes to transmit a packet seizes the channel when it 
senses that the channel is idle. Collisions may occur when two or more users 
sense an idle channel and begin transmission. When the users that are 
transmitting simultaneously sense a collision, they transmit a special signal, 
called a jam signal, that serves to notify all users of the collision and abort their 
transmissions. Both the carrier sensing feature and the abortion of transmission 
when a collision occurs result in minimizing the channel down-time and, hence, 
yield a higher throughput. 

To elaborate on the efficiency of CSMA/CD, let us consider a local area 
network having a bus architecture, as shown in Fig. 15-4-4. Consider two users 
(J x and U 2 at the maximum separation, i.e., at the two ends of the bus, and let 
r d be the propagation delay for a signal to travel the length of the bus. Then, 
the (maximum) time required to sense an idle channel is x d . Suppose that (/, 
transmits a packet of duration T p . User U 2 may seize the channel x d s later by 
using carrier sensing, and begins to transmit. However, user U x would not 
know of this transmission until x d s after U 2 begins transmission. Hence, we 
may define the time interval 2x d as the (maximum) time interval to delect a 
collision. If we assume that the time required to transmit the jam signal is 
negligible, the CSMA/CD protocol yields a high throughput when 2x d « T p . 

There are several possible protocols that may be used to reschedule 
transmissions when a collision occurs. One protocol is called nonpersistent 
CSMA, a second is called 1 -persistent CSMA, and a generalization of the latter 
is called p -persistant CSMA. 


Nonpersistent CSMA In this protocol, a user that has a packet to transmit 
senses the channel and operates according to the following rule. 

(a) If the channel is idle, the user transmits a packet. 

(b) If the channel is sensed busy, the user schedules the packet 
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transmission at a later time according to some delay distribution. At the end of 
the delay interval, the user again senses the channel and repeats steps (a) and 
(b). 

1-Persistent CSMA This protocol is designed to achieve high throughput 
by not allowing the channel to go idle if some user has a packet to transmit. 
Hence, the user senses the channel and operates according to the following 
rule. 

(a) If the channel is sensed idle, the user transmits the packet with 
probability 1. 

(b) If the channel is sensed busy, the user waits until the channel becomes 
idle and transmits a packet with probability one. Note that in this protocol, a 
collision will always occur when more than one user has a packet to transmit. 

p-Persistent CSMA To reduce the rate of collisions in 1-persistent CSMA 
and increase the throughput, we should randomize the starting time for 
transmission of packets. In particular, upon sensing that the channel is idle, a 
user with a packet to transmit sends it with probability p and delays it by r with 
probability 1 - p. The probability p is choseri in a way that reduces the 
probability of collisions while the idle periods between consecutive (nonover- 
lapping) transmissions is kept small. This is accomplished by subdividing the 
time axis into minislots of duration r and selecting the packet transmission at 
the beginning of a minislot. In summary, in the p-persistent protocol, a user 
with a packet to transmit proceeds as follows. 

(a) If the channel is sensed idle, the packet is transmitted with probability 
p, and with probability 1 -p the transmission is delayed by rs. 

(b) If at t = x, the channel is still sensed to be idle, step (a) is repeated. If a 
collision occurs, the users schedule retransmission of the packets according to 
some preselected transmission delay distribution. 

(c) If at t - x, the channel is sensed busy, the user waits until it becomes 
idle, and then operates as in (a) and (b) above. 

Slotted versions of the above protocol can also be constructed. 


The throughput analysis for the nonpersistent and the p-persistent 
CSMA/CD protocols has been performed by Kleinrock and Tobagi (1975), 
based on the following assumptions: 

1 the average retransmission delay is large compared with the packet 
duration T p ; 

2 the interarrival times of the point process defined by the start times of 
all the packets plus retransmissions are independent and exponentially 
distributed. 
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FIGURE 15-4-5 
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For the nonpersistent CSMA, the throughput is 

j Ge- C 
G(1 + 2a) + e~ aG 


(15-4-11) 


where the parameter a = T d /T p . Note that as a— ►(), S— *-G/(l + G). Figure 
15-4-5 illustrates the throughput versus the offered traffic G, with a as a 
parameter. We observe that S — » 1 as G — » sc for a = 0. For a > 0, the value of 
S max decreases. 

For the 1 -persistent protocol, the throughput obtained by Kleinrock and 
Tobagi (1975) is 


In this case, 


G[l + G + aG(l + G + ;flG)]e~ G(l+Za> 
G{1 + 2a) - (1 -e aG ) + (14- aG)e CG+ ‘ ,) 


G(1 + G)e~ 
G + e~ c 


(15-4-12) 


(15-4-13) 


which has a smaller peak value than the nonpersistent protocol. 

By adopting the p-persistent protocol, it is possible to increase the 
throughput relative to the 1-persistent scheme. For example. Fig. 15-4-6 
illustrates the throughput versus the offered traffic with a = xjT p fixed and 
with p as a parameter. We observe that as p increases toward unity, the 
maximum throughput decreases. 

The transmission delay was also evaluated by Kleinrock and Tobagi (1975). 
Figure 154-7 illustrates the graphs of the delay (normalized by T p ) versus the 
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FIGURE 15-4-6 
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throughput 5 for the slotted nonpersistent and p -persistent CSMA protocols. 
Also shown for comparison is the delay versus throughput characteristic of the 
ALOHA slotted and unslotted protocols. In this simulation, only the newly 
generated packets are derived independently from a Poisson distribution. 
Collisions and uniformly distributed random retransmissions are handled 
without further assumptions. These simulation results illustrate the superior 
performance of the p-persistert and the nonpersistent protocols relative to the 
ALOHA protocols. Note that the graph labeled "optimum p-persistent" is 
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FIGURE 15-4-7 Throughput versus delay from simulation ( a = 0.01). [From Klemrock and Tobagi {1975 j, 
©IEEE.] 


obtained by finding the optimum value of p for each value of the throughput. 
We observe that for small values of the throughput, the 1 -persistent (p = 1) 
protocol is optimal. 

15-5 BIBLIOGRAPHICAL NOTES AND REFERENCES 

FDMA was the dominant multiple access scheme that has been used for 
decades in telephone communication systems for analog voice transmission. 
With the advent of digital speech transmission using PCM, DPCM, and other 
speech coding methods, TDMA has replaced FDMA as the dominant multiple 
access scheme in telecommunications. CDMA and random access methods, in 
general, have been developed over the past three decades, primarily for use in 
wireless signal transmission and in local area wireline networks. 

Multiuser information theory deals with basic information-theoretic limits in 
source coding for multiple sources, and channel coding and modulation for 
multiple access channels. A large amount of literature exists on these topics. In 
the context of our treatment of multiple access methods, the reader will find 
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the papers by Cover (1972), El Gamal and Cover (1980) Bergmans and Cover 
(1974), and Hui (1984) particularly relevant. The capacity of a cellular CDMA 
system has been considered in the paper by Gilhousen et al. (1991). 

Signal demodulation and detection for multiuser communications has 
received considerable attention in recent years. The reader is referred to the 
papers by Verdu (1986a-c, 1989), Lupas and Verdu (1990), Xie et al. (1990a, 
b), Poor and Verdu (1988), Zhang and Brady (1993), and Zvonar and Brady 
(1995). Earlier work on signal design and demodulation for multiuser 
communications is found in the papers by Van Etten (1975, 1976), Horwood 
and Gagliardi (1975), and Kaye and George (1970). 

The ALOHA system, which was one of the earliest random access systems, 
is treated in the papers by Abramson (1970, 1977) and Roberts (1975). These 
papers contain the throughput analysis for unslotted and slotted systems. 
Stability issues regarding the ALOHA protocols may.be found in the papers by 
Carleial and Heilman (1975), Ghez et al. (1988), and Massey (1988). Stable 
protocols based on tree algorithms for random access channels were first given 
by Capetanakis (1977). The carrier sense multiple access protocols that we 
described are due to Kleinrock and Tobagi (1975). Finally, we mention the 
IEEE Press book edited by Abramson (1993), which contains a collection of 
papers dealing with multiple access communications. 


PROBLEMS 


15-1 In the formulation of the CDMA signal and channel models described in Section 
15-3-1, we assumed that the received signals are real. For K> 1, this assumption 
implies phase synchronism at all transmitters, which is not very realistic in a 
practical system. To accommodate the case where the carrier phases are not 
synchronous, we may simply alter the signature waveforms for the K users, given 
by (15-3-1), to be complex-valued, of the form 

L - 1 

gk(t) =e" >t 2 *k( n )p(t ~nT c ), l^k^K 

n =0 

where 0 k represents the constant phase offset of the fcth transmitter as seen by the 
common receiver. 

a Given this complex-valued form for the signature waveforms, determine the 
form of the optimum ML receiver that computes the correlation metrics 
analogous to (15-3-15). 

b Repeat the derivation for the optimum ML detector for asynchronous transmis- 
sion that is analogous to (15-3-19). 

15-2 Consider a TDMA system where each user is limited to a transmitted power P, 
independent of the number of users. Determine the capacity per user, C K , and the 
total capacity KC k . Plot C K and KC k as functions of / N 0 and comment on the 
results as K -* *=. 

15-3 Consider an FDMA system with K = 2 users, in an AWGN channel, where user 1 
is assigned a bandwidth = aW and user 2 is assigned a bandwidth W 2 = 
(1 — a)W, where O^ar ^ 1. Let P, and P 2 be the average powers of the two users. 
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a Determine the capacities C, and C 2 of the two users and their sum C = C, + C 2 
as a function of a. On a two-dimensional graph of the rates R 2 versus /?,, plot 
the graph of the points (C 2 , C,) as a varies in the range 0 =£ a =£ 1. 
b Recall that the rates of the two users must satisfy the conditions 

Determine the total capacity C when PJa = P 2 /(l - a) = F, + P 2 , and, thus, 
show that the maximum rate is achieved when a/(l - a) = P,/P 2 = WJW 2 . 

15-4 Consider a TDMA system with K = 2 users in an AWGN channel. Suppose that 
the two transmitters are peak-power-limited to P, and P 2 , and let user 1 transmit 
for 100<*% of the available time and user 2 transmit 100(1 - a)% of the time. The 
available bandwidth is W. 

a Determine the capacities C,, C 2 , and C = C; + C 2 as functions of a. 
b Plot the graph of the points (C 2 , C,) as a varies in the range 0« a ^ 1. 

15-5 Consider a TDMA system with K = 2 users in an AWGN channel. Suppose that 
the two transmitters are average-power-limited, with powers P, and P 2 . User 1 
transmits 100c % of the time and user 2 transmits 100(1 - a)% of the time. The 
channel bandwidth is W. 

a Determine the capacities C,. C 2 , and C = C, + C 2 as functions of a. 
b Plot the graph of the points (C 2 , C,) as a varies in the range 0^s asl. 
c What is the similarity between this solution and the FDMA system in Problem 
15-3. 

15-6 Consider the two-user, synchronous, multiple-access channel and the signature 
sequences shown in Fig. P15-6. The parameter A^0 describes the relative 
strength between the two users, and 0 ® B ^ 1 describes the degree of correlation 
between the waveforms. Let 

2 x 

r ( / )=2 'Z b k (i)s t (t~i)+n(t) 


i,(r) 



*•>(/) 
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denote the received waveform at time t, where n(f) is white gaussian noise with 
power spectral density cr 2 , and 6*(i) e {— 1, +1}. In the following problems, you 
will compare the structure of the conventional multiuser detector to optimimum 
receiver structures for various values of A, 0^ B « 1, and a 2 . 
a Show that, given the observation {r(f), — « a sufficient statistic for the 

data 6,(0) and 6 2 (0) is the observation during t e [0, lj. 
b Conventional (suboptimum) multiuser detection chooses the data 6*(0) accord- 
ing to the following rule: 

6*(0) = sgn (y k ) 

where 

y* = [ r(t)s t (t)dl 


Determine an expression for the probability of bit error for user 1, using the 
notation 


w* = ( sl(t)dl 

Pl2~ ( S|(0^(0 dt. 
■>0 


c What is the form of this expression for A — >0, B < 1, and arbitrary cr 2 ? 
d What is the form of this expression for arbitrarily large A, B < 1, and arbitrary 
cr 2 ? What does this say about conventional detection? 
e What is the form of this expression for B ~ 1, and arbitrary a 2 and A1 Why 
does this differ from the result in (d)? 

f Determine the form of this expression for arbitrarily large cr 2 , arbitrary A. and 
B< 1. 

g Determine the form of this expression for er 2 ^*0, arbitrary A, and B < 1. 

15*7 Refer to Problem 15-6. The maximum-likelihood sequence receiver for this 
channel selects the data 6,(0) and 6,(0) transmitted during the interval [0,1] 
according to the rule 

((6,(0), 6 2 (0)) = argmax A[{r(t), 0 < t < 1} 1 6,, 6,] 

where A[{r(r), 0</< 1} | 6,, 6 2 J is the likelihood function of 6, and b 2 given an 
observation of [r(f), 0<t < 1}. It will be helpful to write this maximization as 

((6,(0), 6,(0)) = argmax argmax A({r(/), 0 < i < 1}|6,,6 2 ] 


where the value 6¥ that satisfies the inner maximization may depend on 6,. Note 
that the need for "sequence detection'' is obviated. 

a Express this maximization in the simplest possible terms, using the same 
notation as in Problem 15-6(b). Reduce this maximization to simplest form, 
using facts like 

argmax Ke l,( ' 1 = argmax /,(. t) 


if, say. K is independent of x. 
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FIGURE P15-8 



b What is the simplest structure of the MLS receiver as the relative strength of the 
interferer vanishes. A—* O'! How does it compare with conventional detection? 
c What is the simplest structure of the MLS receiver for B = 1 and arbitrary A 
and cr 2 ? How does it compare with conventional detection? Why? 
d What is the simplest structure of the MLS receiver for arbitrarily large a 2 and 
arbitrary A and £? How does it compare with conventional detection? 
Determine the error rate for user 1 in this case. [Hint: Use the fact that 
sgn ( yi ) = sgn ( y 2 ± p n ) with high probability in this case.] 
e Determine the error probability of user 1 of the MLS receiver for a‘ — » 0, and 
arbitrarily large A and B < 1? How does it compare with conventional 
detection? 

f What is the structure of the MLS receiver for arbitrarily large A, and B < 1 , and 
arbitrary <x 2 ? How does it compare with conventional detection? What does this 
say about conventional detection in this case? [Hint: Use the fact (hat £|y 2 | is 
roughly A times greater than E J_y,J_] 

15-8 Consider the asynchronous communication system shown in Fig. PI 5-8. The two 
receivers are not colocated, and the white noise processes n°\t) and n l2 \t) may be 
considered to be independent. The noise processes are identically distributed, with 
power spectral density <r 1 and zero mean. Since the receivers are not colocated, 
the relative delays between the users are not the same — denote the relative delay 
of user k at receiver i by r* All other signal parameters coincide for the receivers, 
and the received signal at receiver i is 

2 TC 

r ,0 (t) = 2 £ b k (l)s k (t -IT- r'°) + n 0> (t) 

k — 1 /=-* 

where s k has support on [0, TJ. You may assume that the receiver i has full 
knowledge of the waveforms, energies, and relative delays rV> and Although 
receiver i is eventually interested only in the data from transmitter-/, note that 
there is a free communication link between the sampler of one receiver, and the 
postprocessing circuitry of the other. Following each postprocessor, the decision is 
attained by threshold detection. In this problem^ you will consider options for 
postprocessing and for the communication link in order to improve performance. 
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a What is the bit error probability for users 1 and 2 of a receiver pair that does not 
utilize the communication link, and does not perform postprocessing. Use the 
following notation: 

Po = fs,(t- r\ n )s 2 (t - rV)d! 

Pi! = J-V,(r - r n.s,(/ + T- ?") dt 

*n = J- 4V - t1‘ ’) dt = jsl(t - f/ y ) dt 

b Consider a postprocessor for receiver 1 that accepts y 2 (l - 1) and y 2 (/) from the 
communication link, and implements the following postprocessing on y,(l) 

C/(0 = v, (0 - p 2 ,’ sgn [ v,(/ - 1)] - pij’sgn (y,(/)]. 

Determine an exact expression for the bit error rate for user 1. 
c Determine the asymptotic multiuser efficiency of the receiver proposed in (b). 
and compare with that in (a). Does this receiver always perform better than that 
proposed in (a)? 

15-9 The baseband waveforms shown in Fig. PI 5-6 are assigned to two users who share 
the Same asynchronous, narrowband channel. Assume that B = 1 and A = 4. We 
should like to compare the performance of several receivers, with a criterion of 

(0) . Since this expression is too complicated in some cases, we shall also be 
interested in comparing the asymptotic multiuser efficiency r), of each receiver. 
Assume that r, = 0 but that 0<r-<7 is fixed and known at the receiver, and 
assume that we have infinite horizon transmission, 2M + 1 — * x. 

a For the conventional, multiuser detector: 

(1) Find the exact bit probability of error for user 1. Express this result in terms 
of ii’,, p 12 , p,,. and <r. [Hint: Conditioning on fe 2 (- 1) and fi 2 (0) will help.] 

(ii ) Plot the asymptotic multiuser efficiency rj, as a function of r 2 . Indicate and 
explain the maximum and minimum values of rj, in this plot, 
b For the MLS receiver: 

(■) Plot rj , as a function of r 2 . Explain maximum and minimum values, and 
compare with (a)(ii). 

(ii) Which error sequences are most likely for each value of r,7 
c For the limiting decorrelating detector: 

ii) Find an exact expression for the probability of error for user 1. with similar 
parameters as in (a)(i) [Hint: Don't forget to normalize p u and p 21 .] 

(ii) Plot rj, as a function of r>. Explain the minimum value of 17, in this case, 
and compare with (a)(ii). 

15-10 The symbol-by-symbol detector that minimizes the probability of a symbol error 
differs from the maximum-likelihood sequence detector. The former is more 
completely described as the detector that selects each fi*(0) according to the rule 

ft AO) = argmax A[{a(z), 0 <t < 1} j 6*(0)] 
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a Show that this decision rule minimizes A[b,(0) 5^ b*(0)] among all decision rules 
with observation {r(r), 0< r < 1}. Subject to this criteria, it is superior to the 
MLS receiver. 

b Show that the simplest structure of the minimum-probability-of-error receiver 
for user 1 is given by 


b](0) = argmax 

h, 


exp 


(^) cosh ( 


y 2 -b,p 



ct. Find the simplest form of the minimum-probability-of-error receiver for B = 1 
and arbitrary A and a 1 . How does this compare with the above receivers? 
d Find the limiting form of the minimum-probability-of-error receiver for arbit- 
rarily large a 2 and arbitrary A and B. Compare with the above receivers 
e Find the limiting form of the minimum-probability-of-error receiver for A » 1 
and arbitrary a 2 and B. Compare with the above receivers, 
f Find the limiting form of the minimum-probability-of-error receiver for A » 1 
a 2 — » 0 and arbitrary B. Compare with the above receivers. 

15-11 In a pure ALOHA system, the channel bit rate is 2400 bits/s. Suppose that each 
terminal transmits a 100 bit message every minute on the average, 
a Determine the maximum number of terminals that can use the channel, 
b Repeat (a) if slotted ALOHA is used. 

15-12 Determine the maximum input traffic for the pure ALOHA and slotted ALOHA 
protocols. 

15-13 For a Poisson process, the probability of k arrivals in a time interval T is 


P{k) = 


e AT {\ T-) k 
k\ 


k= 0,1,2,... 


a Determine the average number or arrivals in the interval T. 
b Determine the variance a 2 in the number of arrivals in the interval T. 
c What is the probability of at least one arrival in the interval T? 
d What is the probability of exactly one arrival in the interval T? 

15-14 Refer to Problem 15-13. The average arrival rate is A = 10 packets/s. Determine 
a the average time between arrivals; 

b the probability that another packet will arrive within 1 s; within 100 ms. 

15-15 Consider a pure ALOHA system that is operating with a throughput G = 0.1 and 
packets are generated with a Poisson arrival rate A. Determine 
a the value of A: 

b the average number of attempted transmissions to send a packet. 

15-16 Consider a CSMA/CD system in which the transmission rate on the bus is 
10 Mbits/s. The bus is 2 km and the propagation delay is 5 /us/km. Packets are 
1000 bits long. Determine 
a the end-to-end delay r rf ; 
b the packet duration T p \ 
c the ratio r,,/T„; 

d the maximum utilization of the bus and the maximum hit rate. 



appendix 


THE LEV1NSON-DURBIN 

ALGORITHM 


The Levinson-Durbin algorithm is an order-recursive method for determining the 
solution to the set of linear equations 

(A-l) 

where «!>,, is a p X p Toeplitz matrix, a,, is the vector of predictor coefficients expressed 
as 

®/- — \^r a p2 • - • Qp/i ] 

and <J>,, is a p -dimensional vector with elements 

<t>; = [<MD 4>( 2) ... <Hp)] 

For a first-order (p = 1) predictor, we have the solution 

<f>(0)<!„ = 

a,, = tf(l)/d>(0) ( A-2) 

The residual mean square error (MSE) for the first-order predictor is 

*, = *(0) -«„*(!) 

= <HO)-a 2 , l <t>(0) 

= <f>(0)(l -aj,) (A-3) 

In general, we may express the solution for the coefficients of an mth-order 
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predictor in terms of the coefficients of the (m - l)th-order predictor. Thus, we express 
a m as the sum of two vectors, namely. 


r 

— ro 

5 5 

O <3 

1 


1 


r'dL.-. I 

-P’mm — 


L o . 


1 L \ 


(A -4) 


where the vector d„_, and the scalar k m are to be determined. Also, <!>„, may be 
expressed as 


r*™-. 


U--. 

0(0) J 


(A-5) 


where is just the vector , in reverse order. 

Now 

<*>- ■ d>™-, ]/[*„,-,] , r d„, , i \ r <*>». n 
L«£ , : 0(6) J\[ 0 J l k n , \! U(m)l 

From (A-6), we obtain two equations. The first is the matrix equation 
<J>,„.. l a„,_ l + «&,„-, d,„_, + k , = <b,„ i 
But d> m _,a m .., = Hence, (A-7) simplifies to 


(A-6) 


(A-7) 


- I + = 0 

This equation has the solution 


(A-8) 


dm-i = (A-9) 

But is just d>„,-] in reverse order. Hence, the solution in (A-9) is simply a^ , in 
reverse order multiplied by -k,„. That is. 


-k. 


J pi - } 

tint - ]nt~2 
■ a >n { l . 


The second equation obtained from (A-6) is the scalar equation 

+ <&, id,,,-, + 4>(0)k,„ ~ 4>{m) 


(A-10) 


( A- 11) 


We eliminate d„,_j from (A-ll) by use of (A-10). The resulting equation gives us k„ 
That is. 


. 4>(m) a,„- t 

4>W) - *t*:„ ,4>;„ 1 

d>(m)-<j>;„- 1 a m - l 

0(0) ~ a:, , 


,a,„-| 


(A-12) 
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where g,„ t is the residual MSE given as 

= 0(0)- (A-13) 

By substituting (A-10) for d„ , in (A-4), we obtain the order-recursive relation 

,* ~k m a m .,„ k = !, 2, . . , m - 1, m = \,2,...,p (A-14) 
and 

^itim k m 

The minimum MSE may also be computed recursively. We have 


t m = d>(0) - 2 (A-15) 

* = 1 

Using (A-14) in (A-15), we obtain 

nt - I r- m - 1 -i 

=<M0)~ 2 ^(m)- *<*>(*) (A-16) 

L 4 ,i J 

But the term in square brackets in (A-16) is just the numerator of k„ in (A-12), Hence, 

= - I — - I 

= t(l ~ 


(A-17) 



APPENDIX 


ERROR PROBABILITY 
FOR MULTICHANNEL 
BINARY SIGNALS 


In multichannel communication systems that employ binary signaling for transmitting 
information over the AWGN channel, the decision variable at the detector can be 
expressed as a special case of the general quadratic form 

D = 2 {A l**l 2 + B |VJ- + CV* n* + C*XtY k ) (B-l ) 

X I 


in complex-valued gaussian random variables. A, B , and C’are constants; X k and Y k are 
a pair of correlated complex-valued gaussian random variables. For the channels 
considered, the L pairs {X k , Y k \ are mutually statistically independent and identically 
distributed. 

The probability of error is the probability that D<0. This probability is evaluated 
below. 

The computation begins with the characteristic function, denoted by < l>„(jv), of the 
general quadratic form. The probability that D <0. denoted here as the probability of 
error P h , is 


P„ = P(D < 


0) = J p{D)dD 


(B-2) 


where p(D), the probability density function of D, is related to tM/u) by the Fourier 
transform, i.e.. 
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Hence, 


P>=\ df) ~ j 4>o{ '!v)e ^ dv 


(B'3) 


Let us interchange the order of integration and carry out first the integration with 
respect to D. The result is 

i r " mm 


P *~ 24 L 


-dv 


(B-4) 


where a small positive number e has been inserted in order to move the path of 
integration away from the singularity at v = 0 and which must be positive in order to 
allow for the interchange in the order of integration. 

Since D is the sum of statistically independent random variables, the characteristic 
function of D factors into a product of L characteristic functions, with each function 
corresponding to the individual random variables d k , where 

d k = A |**| 2 + B im 2 + CX k Y? + C*Xt Y k 


The characteristic function of d k is 


'M/'W =- 


(v +yv,)(i> -jv 2 ) exp L (v + ~ jv 2 ) 


f + }va lk )' 


(B-5) 


where the parameters u,, v 2 , a lk , and a 2k depend on the means X k and Y k and the 
second (central) moments yr yy , and /i xy of the complex-vlaued gaussian variables X k 
and y* through the following definitions (|Cj 2 - AB >0): 




4(yx„Mv, - iM.vfKIQ 2 - AB) 




— w 


+ w 


W 3B 


_ Ap xx + Bfx yy + C/x*, + C*/i r|i 


(B-6) 


= 2 (l c f 2 - AB) (\X k \ 2 fi yy + im 2 ~ XtY ki x xy - X k Y*n* t ) 

«2* = A |**| 2 + B ini 2 + CXfY k + C*X k Yt 
H„='2E\(X k -X k )(Y k -Y k )*\ 

Now, as a result of the independence of the random variables d k , the characteristic 
function of D is 

L 

'i'c(jv) = n im» 


*-1 


where 


■MM - ^ exp 

(.V+Jvj l (v-jv 2 ) l +^,)(u -y'vj) 


t L 

1 — X a i*’ a 2 = 2 “a 

* = t 


(B-7) 


(B-8) 
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The result (B-7) is substituted for $ p (jv) in (B-4), and we obtain 

(t/.u,)' t"* 


A = — 


2nj 


r ir dv r IT,«;U’u<»; ~~ 9 

J-x* >r v(i» +;u,)'(u -jv,y exp l(w + 7 'v,)(u — jv 2 ) 


This integral is evaluated as follows. 

The first step is to express the exponential function in the form 


/ , , }A 2 jA \ 

ex P { +— — 

\ v + iv . v—/v,/ 


■ JV | V~jV 2 / 

where one can easily verify that the constants A, r A z , and A } are given as 

A, = a,v,u 2 

A 2 = — ^( 0 , 0 , + ar 2 ) 

u, + v 2 

A? (a,t/> - a,) 

v , + o. 


(B-10) 


Second, a conformal transformation is made from the v plane onto the p plane via 
the change in variable 


u, v -;u 2 

p = — — 

v 2 e + ;v, 

In the p plane, the integral given by (B-9) becomes 

p _ exp [U|t/ 2 (-2aw,i; ; + q 2 v, - a 2 v 2 )/(v t + u 2 ) ; ] 1 


(1 + u 2 /v ,) 2 


! i hji np)dp 


(B-ll) 


(B-12) 


where 


f<„\= II + 

}KP) P L i\-p) 


M,(u 2 /v,) 1) 

exp - v P+— CB- 13) 
L v, + v, u, + u 2 p J 


and T is a circular contour of radius less than unity that encloses the origin. 
The third step is to evaluate the integral 


2iq J r 2? r;J r p (l- 

Xexp r 4-ws 

L v t + v 2 


)P) 2L 1 
p) 


|)_ , A 3 (vJv 2 )1] , 

~P + 7 d P 

t/i + v 2 pi 


(B-14) 


In order to facilitate subsequent manipulations, the constants a?0 and b 5= 0 are 
introduced and defined as follows: 


1.2 _ A 3 (v,lvt) [l2 - 4 2 ( u 2 / u ,) 

2 a , 2 b 

»i + v 2 u, + v 2 


(B-15) 
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Let us also expand the function [1 + 1 as a binomial series. As a result, we 

obtain 




^iip7hr) exp {f*' !b '' p )‘ 1 '’ (B - ,6) 


The contour integral given in (B-16) is one representation of the Bessel function. It 
can be solved by making use of the relations 


/„(«/>) = \ 


i/0 l^ exp (j + '‘ h!p )‘ lp 

.0/0 l P " •‘v(j*H>y)d P 


where !„(-\) is the nth order modified Bessel function of the first kind and the series 
representation of Marcum's Q function in terms of Bessel functions, i.e., 

C?,(a, b) = exp f- 3 (« J + b 2 )] + £ (£) I„(ab) 

First, consider the case O^k^L—2 in (B-16). In this case, the resulting contour 
integral can be written in the formt 

Tnj l P ' *u - P ) exp ij + ^ b2p ) dp * Q ' {a * b) ex P + + 2 0 '»<«*) 

(B-17) 

Next, consider the term k = L - 1 The resulting contour integral can be expressed in 
terms of the Q function as follows: 

2 ^ Ipll-p) exp ( 7" + ^ 2 p) d P = (?•<«■ b > ex P Vii* 2 + b 2 )} (B-18) 

Finally, consider the case L =s k « 2L - 1. We have 

i/l r Yr p ex < , (j + '- l ’’ p )‘ tp 

'%0i p ‘"'’ x < , {j +i - b ‘ p ) dp 

= 2 (r) Uab) = Q,(a,b)exv[{(a 2 +b 2 )]- £ (?) !„(ab) (B-I9) 

Collecting the terms that are indicated on the right-hand side of (B-16) and using 
t This contour integral is related to the generalized Marcum Q function, defined as 

Qm(a, b) = f x(x/a) m 1 exp [- j(x 2 + a')]/,,,. , (ojc ) dx, m&l 

J b 

in the following manner: 

(?„,(*. 6 ) exp [*(,’ + ^-±l exp(^- 2 + \b 2 p)clp 
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the results given in (B-17)-(B-19), the following expression for the contour integral is 
obtained after some algebra: 


|/(p) dp = (l + [exp [j(a 2 + b 2 )]Q,(fl, b) - l„(a 


ft)) 






Equation (B 20) in conjunction with (B-12) gives the result for the probability of 
error. A further simplification results when one uses the following identity, which can 
easily be proved: 


e *p + “ 2Ui - = exp + 


Therefore, it follows that 


P h = Q,(a, b ) - I 0 (ab) exp [-j(o 2 + b ! )] 




»2L-1 


(1 + V 2 l V,) k 

xscwTCV 1 ) 


I(V)(?) + 


exp[~Kfl 2 + b 2 )] 

(1 + v 2 /v,) iL -' 


(B-21) 


(L>1) 


Pf ~ Q\(a, ft) - 


v 2 lv l 

1 + Uj/u, 


/ 0 (ab) exp [ ~ j(fl 2 + b 2 )] (£. = 1) 


This is the desired expression for the probability of error. It is now a simple matter 
to relate the parameters a and b to the moments of the pairs (A"*, T*}. Substituting for 
A 2 and A } from ( B-10) into (B-15), we obtain 


a = 
b = 


[ 

[- 


2v 2 v 2 (a,u 2 - a 2 ) 1 
(«l + v 2 ) 2 J 

2viv 2 (a,vi + ot 2 ) 1 

(u, + v 2 ) 2 J 


1/2 


1/2 


(B-22) 


Since «/,, v 2 , a h and a 2 have been given in (B-6) and (B-8) directly in terms of the 
moments of the pairs X k and Y k , our task is completed. 



APPENDIX 


ERROR PROBABILITIES 
FOR ADAPTIVE RECEPTION 
OF M-PHASE SIGNALS 


In this appendix, we derive probabilities of error for two- and four-phase signaling over 
an /--diversity-branch time-invariant additive guassian noise channel and for M-phase 
signaling over an L-diversity-branch Rayleigh fading additive gaussian noise channel 
Both channels corrupt the signaling waveforms transmitted through them by introduc 
ing additive white gaussian noise and an unknown or random multiplicative gain and 
phase shift in the transmitted signal. The receiver processing consists of cross- 
correlating the signal plus noise received over each diversity branch by a noisy 
reference signal, which is derived either from the previously received information 
bearing signals or from the transmission and reception of a pilot signal, and adding the 
outputs from all L-diversity branches to form the decision variable. 


C-l MATHEMATICAL MODEL FOR AN M-PHASE 
SIGNALING COMMUNICATIONS SYSTEM 

In the general case of M-phase signaling, the signaling waveforms at the transmitter 
aret 

5„(r> = Re K,(/y 2 ^] 


t The complex representation of real signals is used throughout. Complex conjugation is 
denoted by an asterisk. 
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where 

St„(()~gU)exp^j~{n - 1)], n = 1,2 M, O^t^T (C-l) 

and T is the time duration of the signaling interval. 

Consider the case in which one of these M waveforms is transmitted, for the 
duration of the signaling interval, over L channels. Assume that each of the channels 
corrupts the signaling waveform transmitted through it by introducing a multiplicative 
gain and phase shift, represented by the complex-valued number g k , and an additive 
noise Thus, when the transmitted waveform is s /n (r), the waveform received over 
the >cth channel is 


'*(0 = «***.<0 + Z*(0. O^t^T, A: =1,2 L (C-2) 


The noises {z*(r)} are assumed to be sample functions of a stationary white gaussian 
random process with zero mean and autocorrleation function = jV 0 S(r), where N 0 
is the value of the spectral density. These sample functions are assumed to be mutually 
statistically independent. 

At the demodulator. r, k (t) is passed through a filter whose impulse response is 
matched to the waveform g(r). The output of this filter, sampled at time t = T, is 
denoted as 


X k 


= 2%g k exp 



+ N k 


(C-3) 


where & is the transmitted signal energy per channel and N k is the noise sample from 
the Arth filter. In order for the demodulator to decide which of the M phases was 
transmitted in the signaling interval O^t^T, it attempts to undo the phase shift 
introduced by each channel. In practice, this is accomplished by multiplying the 
matched filter output X k by the complex conjugate of an estimate g k of the channel gain 
and phase shift. The result is a weighted and phase-shifted sampled output from the 
Arth-channei filter, which is then added to the weighted and phase-shifted sampled 
outputs from the other L — l channel filters. 

The estimate g k of the gain and phase shift of the Arth channel is assumed to be 
derived either from the transmission of a pilot signal or by undoing the modulation on 
the information-bearing signals received in previous signaling intervals. As an example 
of the former, suppose that a pilot signal, denoted by s M (r), 0 f T, is transmitted 
over the kih channel for the purpose of measuring the channel gain and phase shift. 
The received waveform is 


g*s pt (i) + z pk (t), 0 *sr«r 

where z pk [t) is a sample function of a stationary white gaussian random process with 
zero mean and autocorrelation function 4> p (t) = N 0 5(r). This signal plus noise is passed 
through a filter matched to s pk (l). The filter output is sampled at time t = T to yield the 
random variable X pk = 2%g k + where is the energy in the pilot signal, which is 
assumed to be identical for all channels, and N pk is the additive noise sample. An 
estimate of g k is obtained by properly normalizing X pk , i.e., § k ~g k + N pk l2f p . 

On the other hand, an estimate of g k can be obtained from the information-bearing 
signal as follows. If one knew the information component contained in the matched 
filter output then an estimate of g k could be obtained by properly normalizing this 
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output. For example, the information component in the filter output given by (C-3) is 
V£g k exp M)(n - I)], and hence, the estimate is 



where N k = N k exp [~-j(2nl M){n ~ 1)] and the pdf of N' k is identical to the pdf of N k . 
An estimate that is obtained from the information-bearing signal in this manner is 
called a clairvoyant estimate. Although a physically realizable receiver does not possess 
such clairvoyance, it can approximate this estimate by employing a time delay of one 
signaling interval and by feeding back the estimate of the transmitted phase in the 
previous signaling interval. 

Whether the estimate of g k is obtained from a pilot signal or from the information- 
bearing signal, the estimate can be improved by extending the time interval over which 
it is formed to include several prior signaling intervals in a way that has been described 
by Price (1962a, b). As a result of extending the measurement interval, the 
signal-to-noise ratio in the estimate of g k is increased. In the general case where the 
estimation interval is the infinite past, the normalized pilot signal estimate is 

S* = g* + 2 c.H pk , /2&„ (C-4) 

' /= 1 

where c, is the weighting coefficient on the subestimate of g k derived from the ith prior 
signal interval and N pki is the sample of additive gaussian noise at the output of the filter 
matched to s pk (t) in the ith prior signaling interval. Similarly, the clairvoyant estimate 
that is obtained from the information-bearing signal by undoing the modulation over 
the infinite past is 

I* = g* + X c,H kl 2% 2 c, (C-5) 

' - I ' 1=1 

As indicated, the demodulator forms the product between gf and X k and adds this to 
the products of the other L - 1 channels. The random variable that results is 

L L 

z = 2 2 X k Yf 

k = I k 3 | 

= Zr+jz, (C-6) 

where, by definition, Y k = g k . z, = Re (z ), and z, = 1m ( z )■ The phase of z is the decision 
variable. This is simply 

6 = tan" 1 (^) = tan [imfS x /Re ( S (C-7) 


C-2 CHARACTERISTIC FUNCTION 
AND PROBABILITY DENSITY FUNCTION 
OF THE PHASE $ 

The following derivation is based on the assumption that the transmitted signal phase is 
zero, i.e., n = 1. If desired, the pdf of 8 conditional on any other transmitted signal 
phase can be obtained by translating p(8 ) by the angle 2n(n - 1 )/M. We also assume 
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that the complex-valued numbers [g k ], which characterize the L channels, are mutually 
statistically independent and identically distributed zero-mean gaussian random vari- 
ables. This characterization is appropriate for slowly Rayleigh fading channels. As a 
consequence, the rrandom variables (Jf*, Y k ) are correlated, complex- valued, zero- 
mean, gaussian, and statistically independent, but identically distributed with any other 
pair (X„ Y,). 

The method that has been used in evaluating the probability density p(9) in the 
general case of diversity reception is as follows. First, the characteristic function of the 
joint probability distribution function of z, and z„ where z, and z, are two components 
that make up the decision variable 6, is obtained. Second, the double Fourier transform 
of the characteristic function is performed and yields the density p{z r , Z,). Then the 
transformation 

r = Vzf+ zl, 9 = tan 1 ^ j (C-8) 


yields the joint pdf of the envelope r and the phase 9. Finally, integration of this joint 
pdf over the random variable r yields the pdf of 6. 

The joint characteristic function of the random variables z r and z, can be expressed in 
the form 


<P(jv,Jv 2 ) 


where, by definition, 


4 

m xs m vv (l - M 3 ) 

/ _ • 2 \p\ cos e \ 2 

/ Vm~'mJl - |,*|V 

( - ' 2|/zjsing \ : 4 

V : ; V/n„m, t ( 1 ~|m! 2 V + m xx m yy ( 1 - |/a| : ) 2 

m,, = £(|A r *| 2 ) identical for all k 

m, v = E(\Y k j 2 ) identical for all k 

m, y - E(X k Yjf) identical for all k 


P- = 


m xv 


Vn 


’= \p\e 


(C-9) 


(C-10) 


The result of Fourier-transforming the function <p(jv,,jv 2 ) with respect to the 
variables n, and v 2 is 


p(z„ z.) = 


(l-frlY 

(L - 1)!tt2' 




x exp [| M | ( z r cos e + Zi sin e)]/C t ,(V Z 2 + z, 2 ) (C-l 1) 

where K„(.c) is the modified Hankel function of order n. Then the transformation of 
random variables, as indicated in (C-8) yields the joint pdf of the envelope r and the 
phase 9 in the form 


P(r , *) = 


q-iMpy , 

(L — \)\n2 L r 


exp [|nj/-cos(0- £)}K , ,(/■) 


(C-12) 
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Now, integration over the variable r yields the marginal pdf of the phase 0. We have 
evaluated the integral to obtain p{8) in the form 


p(8) = 


(-1)' l (i-N 2 )' a 
2 k(L — 1)! lab 

!jalcos(8-f) 

[b - !/x| J cos 2 (9 - f)P 


- i - 

~[b^ 


cos 


1 

\p | 2 COS" (0 - f) 
l/ui cos (0 - f ) 


( J 


)])!, 


(C-13) 


In this equation, the notation 


9b 


jf(b,p) 


h - I 


denotes the Lth partial derivative of the function /(b, /a) evaluated at b = 1. 


C-3 ERROR PROBABILITIES FOR SLOWLY 
RAYLEIGH FADING CHANNELS 

In this section, the probability of a character error and the probability of a binary digit 
error are derived for M-phase signaling. The probabilities are evaluated via the 
probability density function and the probability distribution function of 6. 


The Probability Distribution Function of the Phase In order to evaluate the 
probability of error, we need to evaluate the definite integral 

8 ; 

' p(8)d6 ' 


P (0, =S0« 0 2 ) 




where 0, and 0, are limits of integration and p{8) is given by (C-13). Ail subsequent 
calculations are made for a real cross-correlation coefficient p. A real-valued p implies 
that the signals have symmetric spectra. This is the usual situation encountered. Since a 
complex-valued p causes a shift of r in the pdf of 0, i.e., e is simply a bias term, the 
results that are given for real p can be altered in a trivial way to cover the more general 
case of complex-valued p. 

In the integration of p(0), only the range 0 =£ 0 *£ n is considered, because p(8) is an 
even function. Furthermore, the continuity of the integrand and its derivatives and the 
fact that the limits 0, and 0 2 are independent of b allow for the interchange of 
integration and differentiation. When this is done, the resulting integral can be 
evaluated quite readily and can be expressed as follows: 



(-O'-P-mY- 

2n(L - 1)! 


3" 1 j 1 l >Vl-(b/^-l)* ; 

A*/-' U -m 2 L f> !/2 


cot 


where, by definition. 


-cot 


' (vT 7 


ib/p 2 ~\)x 


o]i: 


- p cos 0, 

= ■ ■ 

Vf> - p 1 cos 0, ’ 


X 


(C-14) 


i=l,2 


(C-15> 
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Probability of a Symbol Error The probability of a symbol error for any M-phase 
signaling system is 

Pm- 2f P(6). 

Jz,'M 


)de 


When (C-14) is evaluated at these two limits, the result is 
(— i)*-~'(l -n 2 ) L a '-' 1 


P M = 


rr(L-l)! db 
(a sin (k/M) 


-p cos (n/M) 


: cot 


\ f b - p 2 cos 2 {ji/M) Wb - p 2 cos 2 (x/M) 


(t 


m 


(C-16) 


Probability of a Binary Digit Error First, let us consider two-phase signaling. In 
this case, the probability of a binary digit error is obtained by integrating the pdf p(6) 
over the range \n <0<3n. Since p(8 ) is an even function and the signals are a priori 
equally likely, this probability can be written as 

P 2 = 2j p(0) d$ 

It is easily verified that 0, = { n implies r, = Oand0 2 = ^ implies x 2 = p/Vb-p 2 . Thus, 

* L " r i v ii 

'Lb-M 2 IL, (C ‘ 17) 


A — 


2(L~ I)! 


db l 


After performing the differentiation indicated in (C-l 7) and evaluating the resulting 
function at b = 1, the probability of a binary digit error is obtained in the form 

Next, we consider the case of four-phase signaling in which a Gray code is used to map 
pairs of bits into phases. Assuming again that the transmitted signal is s„(t), it is clear 
that a single error is committed when the received phase is jtt< 0<sjr, and a double 
error is committed when the received phase is Jar < 0 < n. That is, the probability of a 
binary digit error is 

J ri*'* rir 

p(0)d9 + 2 p(0) d6 

*>* ■'.!*/< 

It is easily established from (C-14) and (C-l 9) that 

B . (-ini-nY ^-r i p 

2(L-iy. db L -'[b-p 2 (b - p 2 )(2b - p 2 )'*} 

Hence, the probability of a binary digit error for four-phase signaling is 

P * b = 2 1 1 " V2^<? 0 ( 2 *)(irv) ] 


(C-19) 


(C-20) 


Note that if one defines the quantity p = pN2~p 2 , the expression for P^ in terms 
of p is 




(C-21) 
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In other words. P 4h has the same form as P> given in (C-18). Furthermore, note that p, 
just like p , can be interpreted as a cross-correlation coefficient, since the range of p is 
0«p=sl for 0 *= m l. This simple fact will be used in Section C-4. 

The above procedure for obtaining the bit error probability for an Af-phase signal 
with a Gray code can be used to generate results for M =8, 16, etc., as shown by 
Proakis (1968). 


Evaluation of the Cross-Correlation Coefficient The expressions for the prob- 
abilities of error given above depend on a single parameter, namely, the cross- 
correlation coefficient p, The clairvoyant estimate is given by (C-5), and the matched 
filter output, when signal waveform s,,(r) is transmitted, is X k = 2 + N k . Hence, the 
cross-correlation coefficient is 


where, by definition. 


4 v '(y . ~ + l)(y;’ + v) 



y. 


N„ 


E(\8l I 2 ). 


A: =1,2 L 


(C-22) 


(C-23) 


The parameter v represents the effective number of signaling intervals over which the 
estimate is formed, and y, is the average SNR per channel. 

In the case of differential phase signaling, the weighting coefficients are c, = 1, c, = 0 
for i # 1. Hence, v = 1 and p = y, /(I + y c ). 

When v = *, the estimate is perfect and 


lim p 



Finally, in the case of a pilot signal estimate, given by (C-4) the cross-correlation 
coefficient is 


where, by definition. 




£/ r-/| 


(C-24) 


Sf,= %+ % 
r = m„ 

The values of p given above are summarized in Table C-l. 


C-4 ERROR PROBABILITIES FOR TIME-INVARIANT 
AND RICEAN FADING CHANNELS 

In Section C-2, the complex-valued channel gains {#*} were characterized as zero-mean 
gaussian random variables, which is appropriate for Rayleigh fading channels. In this 
section, the channel gains {g*} are assumed to be nonzero-mean gaussian random 
variables. Estimates of the channel gains are formed by the demodulator and are used 
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TABLE C-l RAYLEIGH FADING CHANNEL 


Type of estimate 

Cross-correlation coefficient ft. 

Clairvoyant estimate 

Vv 


V(y; l + l)(y;’ 

+ V) 


V7v 


Pilot signal estimate 

<”■> V(l + rh)l 

\y, r+ \) 

Differential phase signaling 

Yc 

%+ 1 


Perfect estimate 

J-&- 



VTc + 1 



as described in Section C-l. Moreover, the decision variable 6 is defined again by (C-7). 
However, in this case, the gaussian random variables X k and V*, which denote the 
matched filter output and the estimate, respectively, for the fcth channel, have nonzero 
means, which are denoted by X k and Y k . Furthermore, the second moments are 

m xx = E(\X k - Xrf) identical for all channels 

m yy = E(\Y k - y*| 2 ) identical for all channels 

m xy = E[(X k — A'*)(Y,f — Y^)J identical for all channels 
and the normalized covariance is defined as 


m.y 

^m xx m yy 

Error probabilities are given below only for two- and four-phase signaling with this 
channel model. We are interested in the special case in which the fluctuating component 
of each of the channel gains {g*} is zero, so that the channels are time-invariant. If, in 
addition to this time invariance, the noises between the estimate and the matched filter 
output are uncorrelated then /i = 0. 

In the general case, the probability of error for two-phase signaling over L 
statistically independent channels characterized in the manner described above can be 
obtained from the results in Appendix B. In its most general form, the expresssion for 
the binary error rate is 


A~ Qi[a, b) - / 0 (a)exp[-Kfl 2 + f> 2 )l 

/oCa^expH^ + h*)] ^ 1 /2I- 1\/1 + mY 

[2/d -m)] 2 '--' AA k Ai- m ; 

exp[-|(fl 2 + fr ; )l 
[ 2/(1 “ 




A = Q ,(«, b) - Hi + »)Uab) exp [~\(a 2 + b 2 )] (L = 1) 


(C-25) 


2) 
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where, by definition, 


-(is 

Qi(a, b)= f x exp [-j(a 2 + 



(C-26) 


I„(x) is the modified Bessel function of the first kind and of order n. 

Let us evaluate the constants a and b when the channel is time-invariant, /i = 0, and 
the channel gain and phase estimates are those given in Section C-l. Recall that when 
signal s,(r) is transmitted, the matched filter output is X k = 2'€g k + N k . The clairvoyant 
estimate is given by (C-5). Hence, for this estimate, the moments are X k = 2%g k , 
Y t = g k , m xx = 4£/V„, and m vy =NJ%v, where % is the signal energy, N 0 is the value of 
the noise spectral density, and v is defined in (C-23). Substitution of these moments into 
(C-26) results in the following expressions for a and b: 

a = Vf^ h \V^- \\ 

6 = vT^|Vv+il (C-27) 

% ^ , 

?* = 77 z 18*1 
**0 * -1 


This is a result originally derived by Price (1962). 

The probability of error for differential phase signaling can be obtained by setting 
v = 1 in (C-27). 

Next, consider a pilot signal estimate. In this case, the estimate is given by (C-4) and 
the matched filter output is again X k = 2 + N k . When the moments are calculated 
and these are substituted into (C-26), the following expressions for a and b are 
obtained: 


b = 

where 

y< = 77 S 18*1 

/v () k = 1 

% = % + * p 

r=m P 



r + l 


r + ll 


Vf(v£r^) 


(C-28) 


Finally, we consider the probability of a binary digit error for four-phase signaling 
over a time-invariant channel for which the condition fi = 0 obtains. One approach that 
can be used to derive this error probability is to determine the pdf of 9 and then to 
integrate this over the appropriate range of values of 9. Unfortunately, this approach 
proves to be intractable mathematically. Instead, a simpler, albeit roundabout, method 
may be used that involves the Laplace transform. In short, the integral in (14-4-14) of 
the text that relates the error probability P 2 (y h ) in an AWGN channel to the error 
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TABLE C-2 TIME-INVARIANT CHANNEL 


Type of estimate 


a 


b 


Two-phase signaling 


Clairvoyant 

estimate 




Differential phase 
signaling 

Pilot signal 
estimate 


0 



Clairvoyant 

estimate 

Differential phase 
signaling 

Pilot signal 
estimate 


Four-phase signaling 

v^ iVv+ 1 + Vv^-r r 

-V v + l - V v z + 1 1 
vT^(vr+vf - V2 - V2) 


V4 

- Vy +7 - Vv 3 + r 2 \ 


V'Jy^Vy"^ ; * vV + 1 

+ Vv+T- vV * 1 ) 

\^(V2 + \/2 + V2- V2) 

+ \ / y +r -V^T7 ! ) 


probability P 2 in a Rayleigh fading channel is a Laplace transform. Since the bit error 
probabilities P 2 and P ih for a Rayleigh fading channel, given by (C-18) and (C-21), 
respectively, have the same form but differ only in the correlation coefficient, it follows 
that the bit error probabilities for the time-invariant channel also have the same form. 
That is, (C-25) with (i = Q is also the expression for the bit error probability of a 
four-phase signaling system with the parameters a and 6 modified to reflect the 
difference in the correlation coefficient. The detailed derivation may be found in the 
paper by Proakis (1968). The expressions for a and b are given in Table C-2. 





APPENDIX 


SQUARE-ROOT 

FACTORIZATION 


Consider the solution of the set of linear equations 

R.vC.v = Uv (D-i I 

where is an N X N positive-definite symmetric matrix, C/vis an V-dimensional vector 
of coefficients to be determined, and U* is an arbitrary .V-dimensional vector. The 
equations in (D-l) can be solved efficiently by expressing R, v in the factored form 

R,v=S n D a/ SV (D-2) 

where S v is a lower triangular matrix with elements {s,*} and D* is a diagonal matrix 
with diagonal elements {d k }. The diagonal elements of S N are set to unity, i.e., s„ — ). 
Then we have 

r„ = 2 s.kd>s, k , 1 «/'«/'-], i* 2 

* = i (Do) 

r,, 

where {r tj ) arte the elements of R v . Consequently, the elements {s,*} and {d k \ are 
determined from (D-3) according to the equations 

di = r, , 

/ - l 

s„d, = r„ - £ s ik d k s ik , 1 =£/ =£ i - 1, 2 « i V 

* = i (D-4) 

i - I 

d,= r„ - ^ sj k d k , 2 

* =1 

Thus, (D-4) define S v and D w in terms of the elements of R v . 
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The solution to (D-l) is performed in two steps. With (D-2) substituted into (D-l) 
we have 

s v d v s;c v = U, 

Let 

Y V =D V S V C N (D-5) 

Then 

S V \\ = U V (D-6) 

First we solve (D-b) for Y v . Because of the triangular form of S N , we have 


V| = ti i 


i I 

.v, = - S 2 « /V 


(D-7) 


Having obtained Y N , the second step is to compute C v . That is. 


Beginning with 


D v S v C s = Y, 
S V C N = Dv l Y > 


c\ = y v /d\ 


the remaining coefficients of C N are obtained recursively as follows: 

v, A 

= ~ - Z, V,. I - 1 

, , I 


(D-8) 


(D-9) 


The number of multiplications and divisions required to perform the factorization of 
R\ is proportional to N . The number of multiplications and divisions required to 
compute C v . once S v is determined, is proportional to A In contrast, when R v is 
Toeplitz the Levinson-Durbin algorithm should be used to determine the solution of 
(D-l), since the number of multiplications and divisions is proportional to N 1 . On the 
other hand, in a recursive least-squares formulation. S N and D v are not computed as in 
(D-3), but they are updated recursively. The update is accomplished with N~ operations 
(multiplications and divisions). Then the solution for the vector C v follows the steps 
(D-5)-(D-9). Consequently, the computational burden of the recursive least-squares 
formulation is proportional to N 2 . 


K 
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full response, 192 

minimum-shift keying (MSK), 196—199 

modulation index, 191 

multiamplitude, 200-203 

multi-/t, 295 

partial response, 192 

phase cylinder, 195 

phase trees of, 192 

power spectrum of, 209-219 

representation of, 190-196 

signal space diagram for, 1 99-200 

state trellis, 1% 

trellis of, 195 

Continuously variable slope delta modulation (CVSD), 

135 

Convolutional codes, 470-51 1 
applications of, 506-511 
binary. 470-476 


Convolutional codes (Cont.): 
catastrophic error propagation, 482 
concatenated, 492, 499-500 
constraint length, 470 
decoding, 483-486 
Fano algorithm, 500-503 
feedback, 505-506 
sequential, 500-502 
stack algorithm, 503-504 
Viterbi, 483-486 
distance properties of, 492-496 
dual-lc, 492-499 
encoder, 470-478 
generators, 471-472 
hard-decision decoding, 489-492 
minimum free distance. 479 
nonbinary, 492-499 
optimum decoding of, 483-485 
performance on AGWN channel, 486-492 
performance on BSC, 489-491 
performance on Rayleigh fading channel, 81 1-814 
quantized metrics, 508-510 
soft-decision decoding, 486-489 
state diagram, 474-477 

table of generators for maximum free distance, 493-497 
transfer function, 477-480 
tree diagram, 472 
trellis diagram. 473 
Correlation demodulator, 234-238 
metrics for, 246 
Correlative state vector, 286 
Coset, 447 
Coset leader, 447 
Covariance, 34 
Covariance function, 65 
Cross-correlation function, 65 
Cross-power density spectrum. 68 
Cumulative distribution function (cdf). 23 
Cutoff rate, 394 

comparison with channel capacity, 399-400 
for binary coded signals. 396 
for Af-ary input. A/ -ary output vector channel. 403 
for multiamplitude signals, 397-399 
for noncoherent! channel, 405-406 
for q-ary input Q-ary output channel, 400-401 
system design with. 400-406 
CW jamming, 706 

Cyclic codes (see Bloc l codes, cyclic) 

Cyclostationary process, 75-76, 205 

Data compression, 1 
Data translation codes. 566 

Decision-feedback equalizer (see Equalizers, decision- 
feedback) 
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Decoding of block codes: 
for fading channels: hard-decision, 81 1 
soft -decision, 808-8) ! 
hard-decision, 445-456 
bounds on performance for BSC. 452-455 
Chernoff bound, 455 
syndrome, 449-451 
table lookup method, 447-448 
soft-decision, 456-445 

bounds on performance for AWGN. 440-443 
comparison with hard-decision decoding, 
456-461 

Decoding of convolutional codes: 
for fading channel, performance, 8] 1-814 
feedback, 505-506 
hard-decision, 489-492 
performance on AWGN channel, 486-492 
performance on BSC. 489-491 
sequential, 500-502 
soft decision, 486-489 
stack algorithm, 503-504 
Viterbi algorithm, 483-486 
Delay distortion, 535 
Delay power spectrum, 762 
Delta modulation (see Source, encoding) 
Demodulation/Detection 
carrier recovery for, 337-358 
Costas loop. 355-356 
decision-directed, 347-350 
ML methods, 339-341 
non-decision-directed, 350-358 
squaring PLL, 353-355 
coherent: 

of binary signals, 257-260 
of biorthogonal signals, 264 -•266 
comparison of, 282-284 
of DPSK signals, 274-278 
of equicorrelated signals, 266 
of Af-ary binary coded signals, 266-267 
optimum, 244-257 
of orthogonal signals, 260-264 
of PAM signals, 267-269 
of PSK signals, 269-274 
of QAM signals, 278-282 
correlation-type, 234-238 
of CPFSK, 284-289 
performance, 289-301 
for intersymbol interference, 584-627 
matched filter-type, 238-244 
maximum-likelihood, 244-254 
maximum likelihood sequence, 249-254 
noncoherent, 302-313 
of binary signals, 302-308 
of Af-ary orthogonal signals, 308-312 
multichannel, 680-686 


Demodulation/Detection (Conr.): 
noncoherent (Coal.): 
optimum, 302-312 
symbol-by-symbol, 254-256 
Differential encoding, 187 
Differential entropy, 92 
Differential phase-shift keying (DPSK), 

274-278 

Digital communication system model, 1-3 
Digital modulator, 2 

Direct sequence (see Spread spectrum signals) 

Discrete memoryless channel (DMC). 376-377 
Discrete random variable, 23 
Distance (see Block codes; Convolutional codes, 
minimum free distance) 

Distortion (See also Channel distortion): 
from quantization, 113-125 
granular noise, 134 
slope overload, 134 
Distortion rale function, lit) 

Distributions (see Probability distributions) 

Diversity: 
antenna, 777 
frequency, 777 
performance of, 777-795 
polarization, 778 
RAKE, 778 
time. 111 , 

Double-sideband modulation, 176 
DPCM (Differential pulse code modulation) (see Source, 
encoding) 

DPSK (differential phase-shift keying). 274-278 
Dual code, 426 
Dual-k codes, 492-499 
Duobinary signal, 548-549 

Early-late gate synchronizer, 362-365 
Effective antenna area, 316 
Effective radiated power, 316 
Eigenvalue, 164 
Eigenvector, 164 
Elias bound, 461-463 

Encoding (see Block codes; Conventional codes) 

Energy, 156 

Ensemble averages. 64-65 
Entropy, 88 
conditional, 88 
differentia). 92 

discrete memoryless sources, 94-103 
discrete stationary sources, 103-106 
Entropy coding, 96, 117 
Envelope, 155 
Envelope detection, 306 
Equalizers (See also Adaptive equalizers) 
decision-feedback, 621-627. 649-650 
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Equalizers (Gw.): 
decision-feedback (Com.): 
adaptive. 649-652 
examples of performance, 622-623 
of trellis coded signals, 650-652 
minimum MSE, 622 
predictive form. 626-627 
linear, 601-620, 648-649 
adaptive, 636-644 

convergence of MSE algorithm, 642-644 
error probability, 613-617 
examples of performance, 613-617 
excess MSE, 644-648 
fractionally spaced, 617-620 
LMS (MSE) algorithm, 639-642 
limit on step size, 645-646 
mean-square error (MSE) criterion, 607-620 
minimum MSE, 610-611 
output SNR for, 605, 610 
peak distortion, 602 
peak distortion criterion, 602-607 
zero-forcing, 603-604, 637-638 
maximum-likelihood sequence estimation , 584-586, 
589-593, 607-616 
self- recovering (blind), 644-675 
with trellis-coded modulation, 650-652 
using the Viterbi algorithm, 589-593 
channel estimator for, 652-654 
performance of, 593-601 
Equivalent codes, 418 

Equivalent lowpass impulse response, 157-158 
Equivalent lowpass signal, 155 
Equivocation, 90 
Error function. 40 
Error probability: 
coherent demodulation: 
binary coded, 266-267 
for binary signals, 257-260 
for DPSK. 274-278 
for M-ary biorthogonal, 264-265 
for W-ary equicorrelaied. 266 
for M - ary orthogonal, 260-263 
for W-ary PAM, 267-269 
for PSK, 269-274 
for QAM. 278-282 
union bound for, 263-264 
multichannel. 680-686 
noncoherent demodulation. 301-313 
for binary signsls. 301-308 
for W-ary orthogonal, 308 -312 
Estimate: 
biased, 367 
consistent, 59, 368 
efficient, 368 


Estimate ( Corn .) 
unbiased, 367 

Estimate of phase (See also Carrier phase estimation) 
clairvoyant, 889 
pilot signal, 889 

Estimation, maximum-likelihood sequence (MLSE), 249- 
254 

Estimation: 

maximum likelihood, 334-335 
of carrier phase, 337-358 
of signal parameters, 333-335 
of symbol timing, 358-365 
of symbol timing and carrier phase, 365-371 
performance of, 367-370 
Euclidean: 
distance, 251 
weight, 595 
Events, 18 
intersection of, 19 
joint, 19 

mutually exclusive, 19 
null, 19 

probability of, 19 
union of, 19 
Excess bandwidth, 546 
Excess MSE, 644-648 
Expected value, 33 
Expurgated codes. 816-817 
Extended code, 420 
Extension field, 415 
Eye pattern, 541 


Fading channels. 8, 758-839 (See also Channels) 
Feedback decoding, 505-506 

FH spread spectrum signals (xee Spread spectrum signals) 
Filter: 

integrator, 238 
matched, 239 
Folded spectrum, 606 
Follower jammer, 731 
Fourier transform, 35 
Free euclidian distance, 517 
Free-space path loss, 317 
Frequency diversity. 111 

Frequency division multiple access (rDMA), 842-844 
Frequency -hopped (FH) spread spectrum (xee Spread 
spectrum signals) 

Frequency-shift keying (FSK), 181-183. >90- 191 
continuous-phase (CPFSK): performance of, 284-301 
power density spectrum of, 213-217 
representation of, 190-191 
Functions of random variables, 28-32 
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Galois field. 415 
Gamma function, 42 
Gaussian distribution, 39-41 
multivariate, 49-52 
Gaussian noise, 1 1 
Gaussian random process, 65 

Gaussian random variables, linear transformation of, 
50-52 

Generator matrix, 417 
Generator polynomial, 424 
Gilbert-Varsharmov bound, 463 
Golay codes, 423, 433 
extended. 423 

generator polynomial of, 433 
performance on AWGN channel, 454-455 
Gold sequences, 727 
Gram Schmidt procedure, 167-173 
Granular noise, 134 
Gray encoding, 175 

Hadamard codes, 422-423. 817-821 
Hamming bound on minimum distance. 462 
Hamming codes, 421-422. 433 
Hamming distance, 415 
Hard-decision decoding: 
block codes. 445-456 
convolutional codes, 489-492 
Hilbert transform. 154 
Huffman coding, 96-103 

Illumination efficiency factor, 317 
Impulse noise, 538 
Impulse response. 68 
Independent events, 21 
Independent random variables, 28 
Inforamtion, 84-85 
equivocation, 90 
measure of. 84-91 
mutual, 84 
average, 87 
self-, 85 

average (entropy), 88 
sequence, 3, 83 
Interleaving, 468-470 
block, 469 
convolutional, 470 
Intersymbol interference, 536-537 
controlled (see Partial response signals) 
discrete-time model for. 586-589 
equivalent white noise filter model, 588 
optimum demodulator for, 584-593 
Inverse filter. 603 


Jacobian, 32 
Jamming margin, 707 

Joint cdf (cumulative distribution function), 25 
Joint pdf (probabiltiy density function), 25 
Joint processes, 65 

Kalman (RLS) algorithm, 656-658 
fast, 660 

kasami sequences, 729 
Kraft inequality. 97-98 

Laplace probability density function, 56 
Lattice: 
filter, 660-664 
recursive least squares. 664 
Law of large numbers (weak), 59 
Least favorable pdf, 305 
Least-squares algorithms, 654-664 
Lempel-Ziv algorithm, 106-108 
Lcvinson-Durbin algorithm, 128, 139, 879-881 
Likelihood ratio, 304 
Line codes, 566 

Linear codes (see Block codes, linear: 

Convolutional codes) 

Linear equalization (see Equalizers, linear) 
Linear-feedback shift-register, maximal length, 433-435. 

724-727 

Linear prediction, 128-130, 138-144, 660 -664 
backward, 661-662 
forward. 661 -662 
residuals, 663 

Linear predictive coding (LPC): 
speech. 138-144 

Linear time-invariant system, 68-69 
response to stochastic input, 68-72 
Linear transformation of random variables. 28-29, 50-52 
Link budget analysis, 3)6-319 
Link margin, 319 
Lloyd-Max quantizer. 113 
Lowpass signal, 155 
Lowpass system, 157 

Low probability of intercept. 696, 715-716 

Magnetic recording, 567-568 
normalized density, 567 
Majority logic decoder. 506 
Mapping by set partitioning, 512 
Marginal probability density, 26 
Marcum's 0-function, 44 
Markov chain, 189 
transition probability matrix of, 189 
Matched filter. 238-244 
Maximal ratio combining, 779 
performance of, 780-782 
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Maximum a posteriori probability (MAP) 
criterion, 245, 254-257 

Maximum free distance codes, tables of, 492-4% 
Maximum length shift-register codes, 433-435, 724-727 
Maximum likelihood: 
parameter estimation, 333-335, 339-341 
for carrier phase, 339-341 
for joint carrier and symbol, 365-367 
for symbol timing, 358-364 
performance of, 367-370 
Maximum-likelrhood criterion, 245-246 
Maximum-likelihood receiver, 233-257 
Maximum-likelihood sequence estimation (MLSE), 249- 
254 

Mean-square error (MSE) criterion, 607-617 
Mean value, 33 

Microwave LOS channel, 768-769 
Miller code, 188, 575 
Minimum distance: 
bounds on, 461-464 
definition, 416 
Euclidean, 173 
Hamming, 416 

Minimum-shift keying (MSK), 196-199 
power spectrum of, 213-219 
Models: 

channel. 375-386 
source, 82-84, 93-95 
Modified duobinary signal, 549-550 
Modulation: 
binary, 257-260 
biorthogonal, 264-266 
comparison of, 282-284 
continuous-phase FSK (CPFSK). 190-191 
power spectrum, 213-219 
DPSK, 274-278 
equicorrelated (simplex), 266 
index, 191 
linear, 174-186 
power spectrum of, 204-209 
Af-ary orthogonal, 260-264 
multichannel, 680-686 
nonlinear, 190-203 
offset QPSK, 198 
PAM (ASK), 267-269 
PSK, 269-274 
QAM, 278-282 

Modulation codes, 566-576 (See also Partial response 
signals) 

capacity of, 569 
Miller code, 573 
NRZ, 574 

NRZ1, 566, 568, 574-575 
run-length limited, 568-576 


Modulation codes (Com.): 
run -length limited (Con/.); 
fixed rate, 572 
state dependent, 571 
state independent, 571 
Modulator: 
binary, 2 
digital. 2 
Af-ary, 2 
Moments, 33 
Morse Code, 1J 
Multicarrier communications 
capacity of, 687-689 
FFT-based system, 689-692 
Multichannel communications, 680-686 
with binary signals, 682-684 
with Af-ary orthogonal signals, 684-686 
Multipath channels. 8, 758-839 
Multipath intensity profile, 762 
Multipath spread, 763 
Multiple access methods, 840-849 
capacity of, 843-849 
CDMA, 843, 849-862 
FDMA. 842 

random access, 962-872 
TDMA, 842 

Multiuser communications, 840-872 
Multivariate gaussian distribution, 49-52 
Mutual information, 84 
average. 87-88 
Mutually exclusive events, 18 

Narrowband interference, 704-706 
Narrowband process, 152 
carrier frequency of, 153 
Narrowband signal, 152 
Noise: 

gaussian, 162 
white, 162-163 

Noisy channel coding theorem, 386-387 
Noncoherent combining loss 683-684 
Nonlinear distortion, 537 
Nonlinear modulation, 190 
Nonstationary stochastic process, 63 
Norm, 165 

Normal equations, 128 

Normal random variables (sec Gaussian distribution) 

Null event, 18 

Null space, 416 

Nyquist criterion, 542-547 

Nyquist rate, 14, 72 

Offset quadrature PSK (OQPSK), 198 
On-off signalling (OOK), 321 
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Oplimum demodulation: (see Demodulatjon/De lection) 
Orthogonal signals. 165-166 

Orthogonality principle, mean-square estimation. 608 
Orthonormal: 
expansion, 165-173 
functions, 165-166 

Parity check, 417 
matrix, 419 

Parity polynomial, 426 
Partial-band interference, 734-741 
Partial response signals, 548-560 
duobinary, 548-549 
error probability of, 562-565 
modified duobinary, 549 
precoding for, 551-555 
Partial-time (pulsed) jamming, 717-724 
Peak distortion criterion, 602-607 
Peak frequency deviation, 190 
Perfect codes, 453-454 

Periodically stationary, wide sense, 75-76, 205 
Phase jitter, 538 

Phase-locked loop (PLL), 341-346 
Costas, 355 -356 
decision-directed, 347-350 
M-law type, 356-358 
non-decision-directed, 350-351 
square-law type. 353-355 
Phase-shift keying (PSK), 177-178, 269-274 
adaptive reception of, 887-8% 
pdf of phase, 270-271 
performance for AWGN channel, 271-274 
performance tor Rayleigh fading channel, 780-787 
887-894 

Plotkin bound on minimum distance, 462 
Power density spectrum, 67-68. 204-223 
at output of linear system, 69 
of digitally modulated signals, 204-223 
Prediction (see Linear prediction) 

Preferred sequences, 727 
Prefix condition, % 

Probability: 
a priori, 21 
a posteriori, 21 
conditional, 20, 26-28 
of events, 18 
joint, 19, 25-26 

Probability density function (pdf), 24 
Probability distribution funclion, 23 
Probability distributions, 37-52 
binomial, 37-38 
chi-square, 41-45 
central, 42-43 
noncentral, 42-44 


Probability distributions (Com.): 
gamma, 43 
gaussian, 39-41 
multivariate gaussian, 49-52 
Nakagami, 48-49 
Rayleigh, 45-46 
Rice, 47-48 
uniform, 39 

Probability transition matrix, 377 
Processing gain, 707 
Pseudo-noise (PN) sequences: 
autocorrelation function, 725-726 
generation via shift register, 724-729 
Gold. 727 
Kasami, 729 

maximal-length, 725-726 
peak cross-correlation, 726-727 
preferred, 727 

(See also Spread spectrum signals) 

Pulse amplitude modulation (PAM), 174-176, 267-269 
Pulse code modulation (PCM), 125-133 
adaptive (ADPCM). 131-133 
diffeiential (DPCM), 127-129 
Pulsed interference, 717 
effect on error rate performance, 717-724 

Quadrature amplitude modulation (QAM). 178-180, 
278-282 

Quadrature components. 155 
of narrowband process, 155-156 
properties of, 161-162 
Quantization, 108-125 
block, 118-125 

optimization (Lloyd-Max), 1 13-118 
scalar, 113-118 
vector. 118-125 
Quantization error, 125-133 
Quasiperfect codes, 454 

Raised cosine spectrum, 546 
excess bandwidth, 546 
rolloff parameter, 546 
RAKE correlator, 797-798 
RAKE receiver: 

for binary antipodal signals, 798-803 
for binary orthogonal signals, 801-802 
for DPSK signals, 804 

for noncoherent detection of orthogonal signals. 805 
RAKE matched filter, 799-800 
Random access, 862-872 
ALOHA, 863-867 
carrier sense, 867-872 
with collision detection. 868 
non persistent, 868 
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Random access (Coni.): 
carrier sense (Corn.): 

1 -persistent, 869 
^persistent, 869 
offered channel traffic, 864 
slotted ALOHA, 864 
throughput, 865-867 
unslotted. 864 
Random coding. 399 400 
binary coded signals, 390-397 
multiamplitude signals, 397-399 
Random Processes (see Stochastic processes) 

Random variables, 22-28 
function of, 28-32 
multiple. 25 
orthogonal. 35 
single, 22-24 

statistically independent. 28 
sums of, 58-63 
central limit theorem, 61-62 
transformation of, 28-32 
Jacobian of, 32 
linear, 28, 32, 49-52 
uncorrelated, 34 
Rate: 

code, 2, 414 

of encoded information (see Source encoding) 

Rate distortion function, 108-113 
of bandlimited gaussian source. 112 
of memoryless gaussian source, 109-110 
table of, 112 

Rayleigh distribution. 45-46 

Rayleigh fading (see Channel, fading multipath; Channel, 
Rayleigh fading) 

Reciprocal polynomial. 425 
Recursive least squares (RLS) algorithms, 654-664 
fast RLS. 660 
RLS Kalman, 656-660 
RLS lattice, 660-664 
Reed-Solomon codes. 464-466 
References, 899-916 
Reflection coefficients, 140 
Regenerative repeaters, 314-316 
Residuals, 663 
Rice distribution, 47-48 
Ricean fading channel, 761 
Run-length limited codes, 568-576 
fixed rate, 572 
state dependent, 571 
state independent, 571 

Sample function, 63 
Sample mean, 58 
Sample space. 17-18 


Sampling theorem, 72-73 
Scattering function, 766 
Self-information, 85 
average (entropy), 88 
Sequential decoding, 501-503 
Set partitioning, 512 
Shannon limit, 264 
Shortened code, 421 
Signal constellations: 

PAM. 174-176 
PSK, 177-178 
QAM, 178-180 
Signal design, 540-576 
for band-limited channel, 540-551 
for channels with distortion. 557-560 
for no intersymbol interference, 540-547 
with partial response pulses, 548-551 
with raised cosine spectral pulse, 546-547 
Signal-to-noise ratio (SNR), 258 
Signals: 

bandpass. 152-157 
baseband, 176, 186-189 
binary antipodal, 257 
binary coded, 266-267 
binary orthogonal, 258 
biorthogonal, 183-184 , 264-266 
carrier of, 1 59 

characterization of, 152-163 
complex envelope of, 155 
digitally modulated, 173-209 
cyclostationary, 204-206 
representation of, 173-202 
spectral characteristics of, 202-223 
discrete time, 74-76 
energy of, 156 
envelope of, 155 
equivalent lowpass, 155 
lowpass, 155 

Mary orthogonal, 181-183 
multiamplitude, 174-176 
multidimensional, 180-181 
multiphase, 177-178 
narrowband, 152 

optimum demodulation of, 233-257 
quadrature amplitude modulated (QAM), 178-180 
quadrature components of, 155-156 
properties of. 161-162 
simplex. 184, 266 
speech, 143-144 
stochastic, 62-77, 159-163 
autocorrelation of, 64, 68-70, 75-76 
autocovariance, 64 
bandbass stationary, 159-163 
cross correlation of, 65 



INDEX 927 


Signals (Coni. ): 
stochastic (Coni.) 
ensemble averages of. 64-65 
power density spectrum, 67-68, 204-223 
properties of quadrature components, 161-162 
white noise, 162-163 
Signature sequence. 843 
Simplex signals, 266 
Single-sideband modulation, 176 
Skin depth, 9 

Slope overload distortion. 134 
Slope overload distortion, 134 
Soft decision decoding: 
block codes, 436-445 
convolutional codes, 486-489 
Source: 

analog, 82-83 
binary, 83 

discrete memoryless (DMS), 82-83 
discrete stationary, 103-106 
endoding, 93-144 
adaptive DM, 135-136 
adaptive DPCM, 131-133 
adaptive PCV1. 131-133 
delta modulation (DM), 133-136 
differential pulse code modulation (DPCM), 127-129 
discrete memoryless, 94-103 
Huffman, 99-103 
Lempel-Ziv, 106-108 
linear predictive coding (LPC). 138-142 
pulse code modulation (PCM), 125-127 
models, 82-84 
speech, 143-144 
spectral, 136-138 
waveform, 125-144 
Source coding, 82-144 

Spaced-frequency. spaced-time correlation function, 763 
Spectrum: 

of CPFSK and CPM, 209-219 
of digital signals. 203-223 
of linear modulation, 204-2D9 
of signals with memory. 220-223 
Spread factor, 771 
table of, 77 1 

Spread spectrum multiple access (SSMA). 716 
Spread spectrum signals, 
acquisition of, 774-748 
for antijamming. 712-715 

for code division muiliple access (CDMA), 696, 716- 
717, 741-743 

concatenated codes for, 711-712, 740-741 
direct sequence, 697-700 
applications of, 712-717 
coding for, 710-712 


Spread spectrum signals (Coni.): 
direct sequence (Coni ): 
demodulation of, 701-702 
performance of, 702-712 
with pulse interference, 717-724 
examples of DS, 712-717 
frequency-hopped (FH). 729- 743 
block hopping, 731 
follower jammer for, 731 
performance of, 732-734 
with partial band interference, 734, 741 
hybrid combinations. 743-744 
for low-probability of intercept (LP1). 696. 715-716 
for multipath channels, 795-806 
synchronization of, 744-752 
time -hopped (TH). 743 
tracking of, 748 
uncoded PN, 708 

Spread spectrum system mode), 697-698 
Square-law detection, 306 
Square-root factorization, 660, 897-898 
Staggered quadrature PSK (SQPSK). 198 
Stale diagram. 196, 474-477 
Stationary stochastic processes. 63-64 
strict-sense, 63-64 
wide-sense, 64 
Statistical averages, 64-67 
Sleepest-descent (gradient) algorithm, 639-642 
Stochastic process, 62-72, 159-163 
cyclostationary, 75-76 
discrete-time, 74-76 
narrowband, 159 
nonstationary, 63 
striel-sense stationary, 63-64 
wide-sense stationary, 64 
Storage channel, 10 
Strict-sense stationary. 63-64 
Subband coding, 137 
Symbol interval. 174 
Synchronization: 
carrier, 337-358 
effect of noise, 343-346 
for multiphase signals, 356-358 
with Costas loop, 355-356 
with decision-feedback loop, 347-350 
with phase-locked loop (PLL), 341-346 
with squaring loop, 353-355 
of spread spectrum signals, 744-752 
sliding correlator, 747 
symbol, 336-337 
Syndrome, 446 
Syndrome decoding, 446-451 
System, linear, 68-72 
autocorrelation function at output, 69 
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System, linear {Coni.): 
bandpass, response of, 157-159 
power density spectrum at output, 69-70 
Systematic code, 418 

Tail probability bounds, 53-57 
Chebyshev inequality, 53-54 
Ctiernoff bound, 54-57 

TATS (tactical transmission system), 741-743 
Telegraphy, 13 

Telephone channels, 4, 563-538 
Thermal noise, 3, 1 1 
Threshold decoder, 506 
Time diversity, 777 

Time division multiple access (TDMA), 842-844 
Toeplitz matrix, 879 
Transfer function: 

of convolutional code, 477 -483 
of linear system, 68-72 

Transformation of random variables, 29-32, 49-52 
Transition probabilities, 189 
Transition probability matrix, 189 
for channel, 375-378 
for delay modulation, 189-190 
Tree diagram, 192-195, 471-472 
Trellis-coded modulation, 51 1-526 
free Euclidean distance, 517 
subset decoding, 519 
tables of coding gains for. 522-523 
Trellis diagram, 473 

Unootrelaled random variables, 34 
Uniform distribution, 39 


Union bound, 263-264, 387-389 
Union of events, 18 
Uniquely decodable, 96 
Universal source coding, 106 


Variable-length encoding, 95-103 

Variance, 33 

Vector space, 163-165 

Vector quantization, 118-125 

Viterbi algorithm, 251, 287-289, 483-486 

Vocal tract, 141-143 

Voltage-controlled oscillator (VCO), 341-343 


Weak law of large numbers, 59 
Weight: 

of code word, 414 
distribution, 414 
for Golay, 423 
Welch bound, 728 
White noise, 162-163 
Whitening filter, 587-588 
Wide-sense stationary, 64 
Wiener filter, 14 


Yule-Walker equations. 128 


Z transform, 587 
Zero-forcing equalizer, 602-605 
Zero-forcing filter, 603-604 



